Data masking for AI agents on Snowflake

If you only do one thing before pointing an AI agent at Snowflake, do this: turn on data masking for the connection so the agent reads production data with regulated fields masked and no data-lake copy. This is a setup walkthrough, start to finish, so you can have a masked connection working today.

The goal is narrow and worth it. The agent queries the real warehouse and gets accurate results, while emails, card numbers, and health identifiers come back redacted. There is no sanitized extract to maintain and no copy to secure. Everything below is the order of operations to get there.

Step 1: register the Snowflake connection

hoop.dev is an open-source Layer 7 access gateway. Add Snowflake as a connection so engineers and AI agents query real Snowflake data through hoop.dev rather than connecting directly. Provide the warehouse credentials once. The gateway holds them and brokers access as the session principal, so the agent never carries the password.

Step 2: attach a DLP provider

Masking on a Snowflake connection sends the streaming results to a configured DLP provider for classification. Wire up Presidio or Google DLP. The provider identifies which spans in the returned rows are sensitive, and the gateway uses that to redact before anything reaches the agent.

Step 3: enable masking on the connection

Turn on the masking plugin for the Snowflake connection and choose the field types to redact: email, credit card, national ID, health identifiers. This is per connection, not global, so set it explicitly. Once it is on, data masking applies to every result the agent receives on that connection.

Step 4: connect the agent and test

Authenticate the agent to the gateway through your identity provider over the built-in MCP server, then run a query.

SELECT customer_id, email, plan FROM analytics.customers LIMIT 5;

The email column comes back redacted while customer_id and plan come back intact. That is the masked production read working end to end.

Continue reading? Get the full guide.

AI Data Exfiltration Prevention + Data Masking (Static): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Step 5: confirm no copy and correct redaction

Query Snowflake directly with an admin client and confirm the email is unchanged at rest. The redaction happened on the returned data, not in the warehouse, and no data-lake copy was created. That difference is the whole reason to mask on the connection instead of pre-sanitizing into a shadow table.

Step 6: tune the field types against real queries

The first pass of data masking is usually too broad or too narrow. Run the queries your agent actually issues and read the results. If a column the agent needs for its task comes back redacted, narrow the field type so the analytic value stays legible. If a sensitive column slips through, widen the classification. The DLP provider does the detection, but you decide which categories matter for this connection, so treat the field-type list as something you refine against live queries rather than set once and forget.

A useful default is to redact direct identifiers, email, phone, card number, national ID, health identifiers, and leave behavioral and aggregate columns alone. An agent computing churn does not need a real email; it needs the plan, the login dates, and the usage counts. Mask the first set, keep the second, and the agent does its job on production data with nothing regulated leaving the warehouse in the clear.

Pitfalls

Forgetting the DLP provider. Without it the gateway has nothing classifying the stream, so wire it up first.
Assuming masking is on by default. On Snowflake it is per connection. Enable it and pick the field types.
Redacting too much. Mask identifiers, leave the analytic fields the agent needs to do its job.

For deeper reading, see how the masking plugin classifies and redacts and the getting-started guide for the first connection.

FAQ

How long does this setup take?

Once the DLP provider is wired up, enabling data masking on the connection and testing a query is a short task, not a project.

Does masking change the data in Snowflake?

No. Redaction applies to the returned results before they reach the agent. The warehouse data is unchanged at rest.

Can humans use the same masked connection?

Yes. The same masked path serves engineers and agents alike, so support staff query production without seeing raw PII.

hoop.dev is open source. Follow the setup against the real code at github.com/hoophq/hoop.