Non-human identity: what it means for your data exfiltration (on BigQuery)

When every query that touches BigQuery is executed only by verified service accounts and each result is automatically inspected for sensitive payloads, data exfiltration becomes a controlled exception rather than an inevitable leak.

In many organizations, the reality looks very different. Engineers create a handful of long‑living service accounts, embed their keys in CI pipelines, and hand those credentials to downstream jobs. The same credentials are then reused by scheduled notebooks, automated ETL jobs, and even experimental scripts. Because the credentials are static and widely shared, a compromised build server or a mis‑configured notebook can issue a query that pulls millions of rows and ships them to an external bucket, all without any human ever seeing the request.

Non‑human identities, service accounts, workload identities, and AI agents, are essential for automation, but they do not inherit the contextual checks that a human user would trigger at a login screen. Modern identity providers let you define these identities, attach them to groups, and enforce least‑privilege scopes. That setup tells the system *who* the request is, and it can block a service account from accessing a table it does not need. However, the request still travels directly to BigQuery, bypassing any real‑time inspection, masking, or audit trail. The gap leaves data exfiltration possible whenever a legitimate credential is misused.

Why data exfiltration is a unique risk for non‑human identities

Service accounts are designed to be long‑running and to have programmatic access. Because they lack a user interface, they cannot be prompted for multi‑factor authentication or consent dialogs. If an attacker obtains the private key, they inherit exactly the same privileges the account was granted. The attacker can therefore launch a bulk export, embed the data in a log file, or stream it to an external endpoint, all under the guise of a legitimate job.

Traditional database auditing records who connected and when, but it often does not capture the *content* of the query or the *result set* that left the system. Without inline data masking or result‑level inspection, sensitive columns (PII, financial data, health records) can be exfiltrated in plain text. The combination of static credentials and a lack of content‑aware controls makes non‑human identities the most attractive vector for large‑scale data theft.

Continue reading? Get the full guide.

Non-Human Identity Management + AI Data Exfiltration Prevention: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

How a gateway in the data path stops data exfiltration

hoop.dev is built to sit on the only place enforcement can happen: the data path between the identity provider and BigQuery. By proxying every connection, hoop.dev can inspect the query before it reaches the database, apply inline masking to response fields, and require a just‑in‑time approval workflow for risky operations.

Session recording: hoop.dev records each query and its result set, providing a replay that auditors can review.
Inline masking: Sensitive columns are redacted or tokenized in real time, so even a compromised service account never sees raw PII.
Just‑in‑time approval: When a query matches a policy that could lead to bulk export, hoop.dev pauses the request and routes it to an authorized human for approval.
Query‑level audit: Every command, including the exact SQL text and the identity that issued it, is logged with timestamps and outcome.

All of these outcomes exist only because hoop.dev is positioned in the data path. The setup of service accounts alone cannot provide them; the gateway is the active enforcer.

Practical steps to secure non‑human access to BigQuery

Define each automation workload as a distinct non‑human identity in your IdP. Assign the minimal set of BigQuery roles required for the job.
Configure hoop.dev to proxy BigQuery connections. The gateway holds the service‑account credential, so the workload never sees the raw key.
Create policies that flag queries accessing high‑value tables or returning more than a threshold number of rows. Tie those policies to just‑in‑time approval.
Enable inline masking for columns that contain regulated data. hoop.dev will automatically replace those values before they leave the database.
Review session recordings regularly or integrate them with your SIEM. The logs give you a complete picture of what each service account actually did.

For a step‑by‑step walkthrough of the deployment, see the getting‑started guide. The feature overview explains how masking, approvals, and session replay work together to eliminate data exfiltration risk.

FAQ

Do I need to change existing service‑account keys?

No. hoop.dev can import the existing credential and use it internally. The key never leaves the gateway, so existing automation continues to work while gaining the additional controls.

Can hoop.dev block a query that has already been approved?

Yes. Policies are evaluated at request time. If a later policy change marks the same query as risky, hoop.dev will require a new approval before the next execution.

Is the audit data stored in a compliant way?

hoop.dev generates evidence that satisfies SOC 2 Type II requirements and can be exported to any storage solution you choose for long‑term retention.

Explore the source code and contribute on GitHub: https://github.com/hoophq/hoop