All posts

Non-human identity: what it means for your data exfiltration (on CI/CD pipelines)

How does a service account that runs your build pipeline become a conduit for data exfiltration? When a CI/CD runner checks out code, spins up a temporary container, and talks to a database, it does so under a non‑human identity – a service account, a GitHub token, or an OIDC‑issued workload identity. Those identities are convenient, but they also present a silent attack surface. If an attacker compromises the runner, the credential it carries can be used to pull production data, copy logs, or

Free White Paper

Non-Human Identity Management + CI/CD Credential Management: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

How does a service account that runs your build pipeline become a conduit for data exfiltration?

When a CI/CD runner checks out code, spins up a temporary container, and talks to a database, it does so under a non‑human identity – a service account, a GitHub token, or an OIDC‑issued workload identity. Those identities are convenient, but they also present a silent attack surface. If an attacker compromises the runner, the credential it carries can be used to pull production data, copy logs, or even exfiltrate entire tables without ever touching a human user’s account.

Why data exfiltration is a real risk with non‑human identities

Non‑human identities are typically granted broad, standing permissions because they need to run many jobs. The permission set often includes read access to every database used by the organization, write access to storage buckets, and the ability to invoke internal APIs. This “set‑and‑forget” model means that once the credential is in the runner’s memory, nothing stops a malicious actor from issuing a SELECT * FROM users query and streaming the result to an external endpoint.

Two concrete problems arise:

  • Unlimited reach. The runner can connect directly to the target service, bypassing any audit or approval step. The request flows straight from the CI agent to the database.
  • No visibility. Because the connection is direct, there is no built‑in record of which command was executed, which rows were returned, or who initiated the request. If data leaves the environment, the organization has no forensic evidence.

Both issues exist even when the CI system enforces strong authentication for the service account. Authentication tells you who can start a connection, but it does not control what that connection can do once it reaches the data store.

What a server‑side gateway must provide

The missing piece is a data‑path enforcement layer that sits between the non‑human identity and the target resource. Such a gateway must be able to:

  • Inspect each command in real time and block anything that violates policy.
  • Require a human approval workflow for high‑risk queries, such as bulk exports.
  • Mask sensitive fields in responses so that even a compromised runner only sees redacted data.
  • Record the entire session – the exact command, the returned rows, and the identity that issued it – for replay and audit.

These capabilities turn a static credential into a just‑in‑time, auditable access point. The enforcement happens at the gateway, not in the CI runner or the database, which means the runner never sees the raw credential or the unfiltered data.

Continue reading? Get the full guide.

Non-Human Identity Management + CI/CD Credential Management: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

How hoop.dev implements the data‑path control

hoop.dev is built exactly for this purpose. It acts as a Layer 7 gateway that proxies connections from CI/CD agents to databases, Kubernetes clusters, SSH hosts, and other supported services. The gateway validates the OIDC or SAML token presented by the runner, extracts group membership, and then applies policy before the request reaches the target.

Because hoop.dev sits in the data path, it can enforce every outcome listed above:

  • Session recording. hoop.dev captures each query and its result, storing a replayable log that auditors can review.
  • Inline masking. When a query returns columns marked as sensitive, hoop.dev redacts those fields on the fly, ensuring that downstream processes only see sanitized data.
  • Just‑in‑time approval. If a job attempts to export more than a configured threshold, hoop.dev routes the request to a human approver before allowing it to proceed.
  • Command blocking. Dangerous commands such as DROP DATABASE or bulk SELECTs can be denied automatically based on policy.

All of these controls are possible only because hoop.dev is the only component that ever sees the traffic between the non‑human identity and the target. The CI runner authenticates, but the gateway decides what to let through.

Practical steps to protect your pipelines

To reduce the risk of data exfiltration from CI/CD pipelines, follow this high‑level approach:

  1. Identify every non‑human identity used by your pipelines – service accounts, OIDC workload identities, GitHub tokens, etc.
  2. Scope those identities to the minimum set of permissions required for each job. Avoid granting blanket SELECT * on production databases.
  3. Deploy hoop.dev as a gateway in front of each target service that the pipelines need to reach.
  4. Configure policies that require approval for bulk data reads and that mask columns containing PII, secrets, or other sensitive data.
  5. Enable session recording and integrate the logs with your SIEM or audit platform for continuous visibility.

By moving the enforcement point to hoop.dev, you keep the credential and the raw data out of the runner’s process, while still allowing the pipeline to perform its job.

FAQ

Q: Do I need to change my existing CI scripts to use hoop.dev?
A: No. hoop.dev works with the standard client tools that your pipelines already use – psql, mysql, kubectl, ssh, and others. The gateway is transparent to the application code; you only point the client to the gateway’s address.

Q: Will masking affect the correctness of my tests?
A: Masking is applied only to fields you mark as sensitive. Non‑sensitive data flows unchanged, so functional tests that rely on those fields continue to pass.

Q: How does hoop.dev store the session logs?
A: The logs are written to a storage backend configured by the operator. The important point is that the logs are created by hoop.dev, not by the CI runner, providing a reliable audit trail.

For a quick start, see the getting‑started guide. Detailed policy examples are available in the Learn section.

Explore the source code and contribute on GitHub.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts