All posts

Non-human identity: what it means for your audit trail (on BigQuery)

When a service account runs a query that leaks customer data, the investigation can cost weeks of engineering time and expose the organization to regulatory fines. Without a reliable audit trail, you cannot trace the origin of that leak. The lack of a clear, per‑request record makes it impossible to pinpoint which pipeline, commit, or automated job caused the breach, turning a simple data leak into a costly, reputation‑damaging incident. Non‑human identities, service accounts, CI/CD tokens, and

Free White Paper

Non-Human Identity Management + Audit Trail Requirements: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

When a service account runs a query that leaks customer data, the investigation can cost weeks of engineering time and expose the organization to regulatory fines. Without a reliable audit trail, you cannot trace the origin of that leak. The lack of a clear, per‑request record makes it impossible to pinpoint which pipeline, commit, or automated job caused the breach, turning a simple data leak into a costly, reputation‑damaging incident.

Non‑human identities, service accounts, CI/CD tokens, and AI‑driven agents, are essential for modern data pipelines. They are granted long‑lived credentials so that nightly jobs can load data, dashboards can refresh, and alerts can fire without human interaction. From the platform’s perspective these identities are just strings of characters, but from a compliance standpoint they must be treated as actors whose actions need to be traced.

In a typical deployment the token for a service account is exchanged for a short‑lived credential that the job uses to call the BigQuery API directly. The audit log that BigQuery produces records the service account name, but it does not capture the surrounding context: which CI job invoked the query, which Git commit triggered it, or whether an operator approved the operation. The result is an audit trail that tells you *who* accessed the data, but not *why* or *how* the request originated.

This gap is the precondition many teams face. They have comprehensive identity provisioning (OIDC, SAML, IAM roles) and least‑privilege scopes for each service account, yet the request still travels straight to BigQuery without any intervening enforcement point. No gateway exists to enrich the log with request‑level metadata, to mask sensitive result fields, or to require a manual approval before a destructive query runs. The audit trail remains incomplete, and the organization cannot reliably answer questions about intent or responsibility.

Introducing a data‑path gateway

hoop.dev provides the missing enforcement layer by sitting in the data path between the identity provider and BigQuery. When a non‑human identity initiates a query, the request is first routed through hoop.dev. The gateway validates the OIDC token, extracts the service account identity, and then forwards the query to BigQuery on behalf of the caller.

How hoop.dev creates a reliable audit trail

Because hoop.dev is the only point where traffic reaches the database, it can record every session in detail. hoop.dev captures the full query text, the exact timestamp, the originating service account, and any additional metadata supplied by the CI system (for example, pipeline ID or Git SHA). It stores this information in an audit log that can be queried later for forensic analysis.

Continue reading? Get the full guide.

Non-Human Identity Management + Audit Trail Requirements: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Beyond plain logging, hoop.dev can apply inline masking to any fields that contain personally identifiable information, ensuring that downstream analysts never see raw PII. It can also enforce just‑in‑time approvals: if a query matches a risky pattern, hoop.dev pauses execution and routes the request to a human reviewer before allowing it to proceed.

Enforcement outcomes that matter

  • hoop.dev records each query, creating a complete audit trail that ties every data access back to a specific non‑human identity and its execution context.
  • hoop.dev masks sensitive result columns in real time, reducing exposure of regulated data.
  • hoop.dev blocks disallowed statements such as DROP TABLE or DELETE without a prior approval, preventing accidental or malicious data loss.
  • hoop.dev retains session recordings that can be replayed to understand exactly what happened during a breach investigation.

Why this matters for compliance and risk management

Regulators increasingly require evidence that every data access is attributable to a distinct actor and that high‑risk operations are reviewed. By routing non‑human traffic through hoop.dev, organizations gain the concrete logs needed for SOC 2, GDPR, and other audit frameworks without having to instrument each individual pipeline. The cost of a breach drops dramatically when the root cause can be identified in minutes rather than days.

Getting started

Deploy the gateway near your BigQuery instance using the Docker Compose quickstart or a Kubernetes manifest. Configure your CI system to obtain an OIDC token for the service account and point the BigQuery client at the hoop.dev endpoint instead of the native API URL. Detailed steps are available in the getting‑started guide and the broader learn section, which cover identity setup, agent deployment, and policy definition.

Once the gateway is in place, you can define masking rules, approval workflows, and query‑level blocklists that align with your organization’s risk posture. All of this runs on open‑source software, so you retain full control over the implementation.

Explore the source code and contribute to the project on GitHub: hoop.dev repository.

FAQ

How does hoop.dev differentiate between multiple service accounts?

hoop.dev extracts the identity claim from the OIDC token presented by each client. It records that claim alongside the query, so you can filter logs by exact service account or by the CI pipeline that generated the token.

Does hoop.dev store query results?

hoop.dev records metadata about each query and can mask fields in the response, but it does not act as a data warehouse. The actual result set is streamed directly to the client after any masking rules are applied.

Can I add audit‑trail coverage to existing pipelines without rewriting them?

Yes. By updating the connection string to point at the hoop.dev endpoint, existing jobs automatically route through the gateway and gain full audit‑trail visibility without code changes.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts