Here is the end-state worth building toward. An AI agent reads production data with regulated fields masked, no data-lake copy, and you never had to choose between giving it real Snowflake data and exposing PII. The agent gets accurate, useful results, and the PCI, PHI, and PII never leave the warehouse in the clear. Data masking on the connection is what makes that picture real.
Most teams reach for the alternative: copy a sanitized extract into a separate dataset and point the agent at that. It works until the copy goes stale, multiplies, and becomes its own thing to secure. The better end-state has no copy at all. The agent queries production, and sensitive columns are redacted in the returned rows before they reach it.
The end-state in detail
Picture the agent issuing SELECT email, plan, last_login FROM customers. It gets back real plans and login times, the data it actually needs, while the email column comes back redacted. Nothing was copied to a lake, nothing was pre-sanitized into a shadow table, and the agent could not have reached the raw email even if it tried. That is data masking applied inline on the connection, not a separate pipeline.
Snowflake has native dynamic data masking through masking policies on columns, and it is a legitimate tool. But it ties redaction to the Snowflake role the connection uses, and agents usually connect through a single shared role, so the policy either masks for every agent or none. It also lives inside the warehouse, which means the team that manages Snowflake objects owns the redaction logic, separate from whoever governs the agents. Masking on the connection puts the redaction at the same boundary that already handles the agent's identity and access, so a security team controls what an agent sees without editing warehouse objects or minting a role per agent.
The end-state is the same either way: no clear-text PII in the agent's hands. The difference is where the control lives and who can change it. Putting it on the connection keeps masking, scope, and recording on one surface instead of split across three teams.
How to reach it with the gateway
hoop.dev is an open-source Layer 7 access gateway. It proxies the Snowflake connection through an in-network agent, so engineers and AI agents query real Snowflake data through hoop.dev, and the masking plugin redacts regulated fields in the returned data before it reaches the client.
- Register the Snowflake connection in hoop.dev. The gateway brokers access as the session principal and holds the warehouse credential.
- Enable the masking plugin on the connection and point it at a configured DLP provider, such as Presidio or Google DLP, which classifies the streaming results.
- Define which field types to redact: emails, card numbers, national IDs, health identifiers. The provider classifies; the gateway redacts before return.
- Connect the agent through the gateway and run a query. Sensitive columns come back masked, the rest comes back real.
Verify the masking
Run a query that selects a known sensitive column through the agent and confirm the value is redacted in the result. Then query Snowflake directly with an admin tool and confirm the raw value is unchanged at rest. Masking happens on the returned data, not in the warehouse, so production stays intact while the agent only ever sees the masked view.
Pitfalls
- Treating masking as on by default. On Snowflake it is configured per connection through a DLP provider. Turn it on and define the field types.
- Falling back to a sanitized copy "just in case." That reintroduces the data-lake copy the end-state was built to avoid.
- Masking the columns an agent needs unredacted. Redact identifiers, keep the analytic fields legible.
See how inline data masking works on database connections and the getting-started guide to configure your first masked connection.
FAQ
Does the agent ever touch unmasked data?
No. Redaction happens on the returned data before it reaches the client, so the agent only sees the masked result.
Is a copy made to a data lake?
No. The agent queries production Snowflake through the gateway and regulated fields are masked in transit, with no data-lake copy.
What classifies the sensitive fields?
A configured DLP provider, Presidio or Google DLP, classifies the streaming content, and the gateway redacts before results return.
hoop.dev is open source. Read the masking code and stand up production-safe agent access at github.com/hoophq/hoop.