All posts

PII/PHI redaction for AI coding agents on Snowflake

An AI‑assisted code generation pipeline pulls schema information from Snowflake to suggest query completions, but without pii/phi redaction the raw rows can expose sensitive data. The pipeline runs under a service account that holds a static Snowflake user and password. When a developer asks the assistant to retrieve customer data, the raw rows, including names, social security numbers, and medical codes, flow straight back to the model. Because the connection bypasses any data‑loss prevention

Free White Paper

AI Agent Security + Single Sign-On (SSO): The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

An AI‑assisted code generation pipeline pulls schema information from Snowflake to suggest query completions, but without pii/phi redaction the raw rows can expose sensitive data. The pipeline runs under a service account that holds a static Snowflake user and password. When a developer asks the assistant to retrieve customer data, the raw rows, including names, social security numbers, and medical codes, flow straight back to the model.

Because the connection bypasses any data‑loss prevention layer, the language model can memorize or regurgitate personally identifiable information (PII) and protected health information (PHI). Auditors later discover that query logs contain full payloads, and the organization cannot prove that sensitive fields were ever filtered.

The engineering team wants to enforce pii/phi redaction on every response that leaves Snowflake, but they do not want to rewrite every client or embed a masking library in the AI service. They also need a record of who asked for which data and an ability to block queries that attempt to exfiltrate large batches of personal records.

Why pii/phi redaction matters for AI coding agents on Snowflake

AI coding assistants operate by learning from the data they receive. If unfiltered rows contain health identifiers or credit‑card numbers, the model can inadvertently expose that information in unrelated code suggestions, creating a compliance breach. Regulations such as HIPAA and GDPR consider any inadvertent disclosure a violation, and the penalties can be severe.

Beyond legal risk, unmasked data inflates the attack surface. A compromised assistant instance could be used as a data exfiltration channel, sending raw PII to an external endpoint. Without a central control point, each Snowflake client would need its own masking logic, leading to inconsistent policies and operational overhead.

How hoop.dev enforces inline masking for Snowflake

hoop.dev sits in the data path between the AI agent and Snowflake, inspecting each Snowflake response and applying inline masking before the data reaches the model. The gateway holds the Snowflake credentials, so the AI service never sees a secret. Identity is still verified through OIDC, ensuring that only authorized service accounts can initiate a session.

hoop.dev records every query and its result, creating an audit trail that auditors can review. When a request matches a policy that requires human approval – for example, a SELECT that touches a table flagged as containing PHI – the gateway pauses the flow and routes the request to an approver. Only after explicit consent does the query continue.

Continue reading? Get the full guide.

AI Agent Security + Single Sign-On (SSO): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

hoop.dev masks sensitive fields in real time using a pluggable redaction engine. The policy can specify column‑level patterns such as social security numbers, medical codes, or email addresses. The gateway rewrites the response, replacing each match with a placeholder before the data is handed to the AI model. Because the transformation happens inside the gateway, no copy of the raw data is ever written to disk or streamed to the downstream service.

Session recording is another enforcement outcome. hoop.dev captures the full request‑response exchange, timestamps, and the identity that initiated it. Teams can replay a session to understand exactly what data was requested and how it was redacted, satisfying audit requirements without relying on client‑side logs.

Deploying the gateway for Snowflake

The deployment model is container‑based; a Docker Compose file launches the gateway and the network‑resident agent that talks to Snowflake. Configuration of the Snowflake connection – host, warehouse, and the service‑account credentials – lives in the gateway’s manifest, not in the AI code. The getting started guide walks through the minimal steps to register a Snowflake target and define a masking policy.

Once the gateway is running, the AI service connects using its normal Snowflake client library, pointing the client to the gateway’s host and port. From the client’s perspective nothing changes; the gateway transparently proxies the wire protocol, applying the policies defined in the feature documentation.

Because hoop.dev is the only component that can see the clear response, the enforcement outcomes – masking, approval, recording – exist solely because the gateway sits in the data path. Removing hoop.dev would revert the system to the original, unprotected state described at the start of this article.

FAQ

Does hoop.dev store raw Snowflake data?

No. The gateway masks data before it leaves the process, and only the redacted payload is forwarded to the downstream AI service. Recorded sessions contain the masked view, preserving privacy while still providing a complete audit trail.

Can I apply different masking rules per table?

Yes. Policies are defined per connection and can target specific columns or patterns. This granularity lets you treat a table of employee salaries differently from a table of anonymized analytics data.

What happens if a query is denied?

hoop.dev returns an explicit denial response to the client, indicating that the request requires approval or violates a guardrail. The denial is also recorded, so you have evidence of the attempted access.

Explore the source and contribute on GitHub.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts