All posts

PII/PHI redaction for AI agents on Snowflake

A team scopes its AI agent to a single analytics schema, records every query, and feels done. Then someone notices that schema has a raw email column and a date-of-birth field, and the agent has been reading both in the clear the whole time. Scoping limited where the agent could go. It did nothing about what came back. The skipped control is PII/PHI redaction. This is the control that hides behind the others. Access scoping and recording are visible and satisfying to set up. Redaction is quiete

Free White Paper

AI Agent Security + Single Sign-On (SSO): The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

A team scopes its AI agent to a single analytics schema, records every query, and feels done. Then someone notices that schema has a raw email column and a date-of-birth field, and the agent has been reading both in the clear the whole time. Scoping limited where the agent could go. It did nothing about what came back. The skipped control is PII/PHI redaction.

This is the control that hides behind the others. Access scoping and recording are visible and satisfying to set up. Redaction is quieter, so teams assume "read-only on one schema" is safe. Read-only on a schema full of regulated fields is still a path to raw PII and PHI.

The control teams forget

The missing piece is redacting regulated fields in the data the agent receives, so it reads production with PCI, PHI, and PII masked and no data-lake copy. The agent still gets the real, useful columns. The identifiers come back redacted. Without this, every other control is about where the agent goes, and none of them about what it sees.

PII/PHI redaction is the control most likely to be skipped because the alternatives feel responsible. Pointing the agent at a "non-prod" copy seems safe until you check whether that copy was ever actually scrubbed, and how often it goes stale. Trusting that the analytics schema "does not have sensitive data" works right up until a well-meaning pipeline adds a customer email for a join. The only durable answer is to redact at the point the data leaves the warehouse, on every query, regardless of which schema or table the regulated field turns up in.

That is why redaction belongs on the connection rather than in a data-prep step. A sanitized extract protects the rows you remembered to sanitize. Connection-level PII/PHI redaction protects whatever the DLP provider classifies as sensitive in the live results, including fields that arrived after you last reviewed the schema. The agent reads current production data, and the regulated columns are masked on the way out whether or not anyone anticipated them.

Continue reading? Get the full guide.

AI Agent Security + Single Sign-On (SSO): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Add redaction at the connection

hoop.dev is an open-source Layer 7 access gateway. It proxies the Snowflake connection through an in-network agent, so engineers and AI agents query real Snowflake data through hoop.dev, and the masking plugin redacts regulated fields in returned data before it reaches the client.

  1. Register the Snowflake connection in hoop.dev. The gateway holds the warehouse credential and brokers access as the session principal.
  2. Attach a DLP provider, Presidio or Google DLP, to classify the streaming results.
  3. Enable masking on the connection and select the field types: emails, card numbers, national IDs, health identifiers.
  4. Connect the agent and run a query. Regulated columns return redacted; the rest returns intact.

Verify the redaction

Run a query that selects a known PII or PHI column through the agent and confirm the value comes back redacted. Then query Snowflake directly and confirm the raw value is unchanged at rest. PII/PHI redaction happens on the returned data, not in the warehouse, so production stays whole while the agent only sees the masked view.

Pitfalls

  • Assuming scoped access implies safe data. A narrow schema can still hold raw identifiers. Add redaction.
  • Building a sanitized copy instead. That reintroduces a data-lake copy to secure and keep current.
  • Skipping the DLP provider. Without it the gateway has nothing classifying the regulated spans.

See how the masking plugin redacts regulated fields and the getting-started guide to configure the connection.

FAQ

Does scoping access remove the need for PII/PHI redaction?

No. Scoping controls where the agent can query. Redaction controls what comes back. A scoped agent can still read raw identifiers without it.

Does redaction make a copy of the data?

No. Regulated fields are masked in the returned results before they reach the agent, with no data-lake copy.

What decides which fields are PII or PHI?

A configured DLP provider classifies the streaming content, and the gateway redacts those fields before results return.

hoop.dev is open source. Read the masking code and add PII/PHI redaction to your agents' Snowflake access at github.com/hoophq/hoop.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts