All posts

AI coding agents: what they mean for your data exfiltration (on Snowflake)

An engineering team recently added an AI coding assistant to their CI pipeline. The assistant writes Snowflake queries on behalf of developers, pulling column names from schema introspection and inserting sample data for test runs. Because the pipeline runs under a shared service account, the assistant inherits full read‑write privileges on the data warehouse. This scenario illustrates a classic data exfiltration risk, where automated code silently moves sensitive records out of Snowflake. When

Free White Paper

AI Data Exfiltration Prevention + Single Sign-On (SSO): The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

An engineering team recently added an AI coding assistant to their CI pipeline. The assistant writes Snowflake queries on behalf of developers, pulling column names from schema introspection and inserting sample data for test runs. Because the pipeline runs under a shared service account, the assistant inherits full read‑write privileges on the data warehouse. This scenario illustrates a classic data exfiltration risk, where automated code silently moves sensitive records out of Snowflake. When a developer pushes a change, the AI automatically executes the generated query and stores the result in a temporary table. The next job in the pipeline extracts that table and ships the CSV to an external storage bucket that is not covered by the same retention policy. The organization discovers weeks later that a large portion of customer PII has been copied to an uncontrolled location.

Data exfiltration via AI coding agents

AI coding agents excel at generating code quickly, but they also inherit whatever permissions the underlying credential provides. When those credentials are overly permissive, the agent can read entire tables, export them, or even drop data. Because the agent operates programmatically, the activity blends in with normal batch jobs, making it hard for a human reviewer to spot the unusual data movement. The risk is amplified when the agent is granted access to a Snowflake account that stores regulated or personally identifiable information.

Why traditional controls fall short

Most teams rely on three layers of protection: identity federation, role‑based access control inside Snowflake, and audit logging. Identity federation (the setup) ensures that only authenticated identities can request a token. Snowflake roles (the setup) limit what each identity can do. Audit logs (the setup) record who ran which query. However, these controls assume that the request reaches Snowflake directly from the identity holder. In practice the AI agent uses a static service account token that bypasses any real‑time approval step. The request still travels straight to Snowflake, so there is no point where the request can be inspected, masked, or blocked based on its content. The audit log captures the fact that a query ran, but it does not prevent the query from running, nor does it hide sensitive columns in the response.

Placing a gateway in the data path

To close the gap, the enforcement point must sit in the data path, between the identity holder and Snowflake. hoop.dev provides a Layer 7 gateway that proxies every Snowflake connection. By routing traffic through hoop.dev, the organization gains a single place where policy can be applied to the actual query and its result set. hoop.dev verifies the OIDC token, maps group membership to Snowflake roles, and then inspects each SQL statement before it reaches the database. If a statement attempts to export more rows than allowed, hoop.dev can pause the request for a human approver. If a result set contains columns marked as sensitive, hoop.dev masks those fields in real time. Every session is recorded for replay, giving investigators a complete picture of what the AI agent did.

Continue reading? Get the full guide.

AI Data Exfiltration Prevention + Single Sign-On (SSO): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Key enforcement capabilities

  • Session recording – hoop.dev captures the full request and response stream, so any accidental data leak can be reviewed later.
  • Inline masking – hoop.dev replaces sensitive column values with placeholders before they leave Snowflake.
  • Just‑in‑time approval – hoop.dev can require a manual sign‑off when a query exceeds a predefined risk threshold.
  • Command blocking – hoop.dev can reject statements that match a deny list, such as DROP TABLE or UNLOAD to external storage.

Getting started

Review the getting started guide to deploy the gateway and register a Snowflake connection. The learn section explains how to define masking rules and approval workflows that match your data‑governance policies.

FAQ

Can Snowflake native logs replace a gateway?

Snowflake logs show that a query ran, but they cannot stop the query, mask the data, or require an approval step. The gateway is the only place where those controls can be enforced.

Does hoop.dev store credentials for Snowflake?

No. The gateway holds the credential only long enough to establish the backend connection. Users and agents never see the secret.

Is the solution open source?

Yes. The entire gateway, including the data‑path enforcement engine, is MIT licensed and available on GitHub.

Explore the source code on GitHub.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts