A recently off‑boarded contractor’s CI pipeline still calls an AI coding assistant to generate Snowflake queries. The assistant, powered by ChatGPT, receives query results that include raw customer SSNs and credit‑card numbers. Because the pipeline has no guardrail, those sensitive fields could be logged, cached, or even exposed to downstream services.
In that situation, data masking is the only reliable way to ensure the AI never sees the raw values it doesn’t need. Masking protects personally identifiable information (PII) while still allowing the model to reason about schema, query shape, and performance characteristics.
Why data masking matters for AI coding agents
ChatGPT excels at generating code, but it operates on the data it receives. When an AI agent is fed unfiltered query results, it can inadvertently embed sensitive literals into generated scripts, configuration files, or logs. This creates three concrete risks:
- Leakage through logs. CI systems often capture stdout and stderr. If a query returns a raw SSN, that value may be persisted forever.
- Model memorization. Large language models can retain snippets of data they process, potentially resurfacing it in unrelated contexts.
- Downstream propagation. Generated code may be checked into repositories, spreading the data beyond the original request.
Masking the data at the point of delivery eliminates these vectors without restricting the AI’s ability to understand column names, data types, or query structure.
Architectural pattern for masking ChatGPT output
The first layer of protection is the setup. An OIDC or SAML identity provider authenticates the AI service account and any human operators. The provider issues short‑lived tokens that encode group membership and role information. This step decides *who* can initiate a request, but it does not enforce any content‑level policy.
Next, the request travels through a Layer 7 gateway that sits between the AI client and Snowflake. This gateway is the data path. Because it terminates the protocol, it is the only place where request and response payloads can be inspected, transformed, or blocked before they reach the AI.
Finally, the gateway produces the enforcement outcomes. It applies field‑level masking rules to every response, records the full session for replay, and keeps a log that includes the masked output together with the identity that issued the request.
How hoop.dev enforces data masking for Snowflake
hoop.dev implements exactly this pattern. After the identity provider validates the AI service account, hoop.dev receives the OIDC token, extracts the groups, and determines the masking policy that applies to the Snowflake connection. The gateway then proxies the SQL client traffic:
- When Snowflake returns rows, hoop.dev scans the result set for columns that match configured sensitive patterns (for example, columns named ssn or credit_card).
- For each matching field, hoop.dev replaces the raw value with a placeholder such as ***MASKED*** before the data is handed to the ChatGPT client.
- The entire exchange – the original query, the masked response, and the identity that issued the request – is recorded in a session log that can be replayed for forensic analysis.
- If a request attempts to retrieve an unmasked column that the policy forbids, hoop.dev can block the command outright and route it for human approval.
Because the masking happens inside the gateway, the AI never sees the sensitive values. The AI can still generate useful code, because column names and data types remain visible. The audit log provides evidence that every request complied with the organization’s data‑handling rules.
Getting started with hoop.dev
To put this architecture into production, follow the high‑level steps outlined in the official documentation:
- Deploy the hoop.dev gateway using the provided Docker Compose quick‑start or a Kubernetes manifest.
- Configure Snowflake as a connection, supplying the service account credentials that the gateway will use.
- Define masking rules for the sensitive columns you want protected.
- Register the AI service account in your OIDC provider and grant it the minimal role needed to invoke the gateway.
- Update your CI pipeline or application code to point the ChatGPT client at the hoop.dev endpoint instead of directly at Snowflake.
All of these details, including exact configuration syntax and best‑practice recommendations, are covered in the getting‑started guide and the broader learn section. The repository on GitHub contains the full source code and deployment manifests.
FAQ
Does hoop.dev store any raw data from Snowflake?
No. The gateway only holds credentials for the duration of the connection and discards raw result rows after applying masking. The audit log contains the masked payload together with metadata about who made the request.
Can I apply different masking policies per user or per project?
Yes. Because hoop.dev evaluates the OIDC token on each request, you can tie masking rule sets to groups, roles, or even individual service accounts, enabling fine‑grained control.
What happens if a query tries to bypass the mask?
hoop.dev can block the command and optionally route it to a human approver. The blocked attempt is still logged for audit purposes.
Explore the source code, contribute improvements, or file issues at https://github.com/hoophq/hoop.