All posts

Data masking for AI agents on Postgres

What happens to a Social Security number after an AI agent reads it from Postgres? It lands in a prompt, maybe a vector store, maybe a third-party model's logs, and a chain of systems you do not fully control. The question of data masking for AI agents on Postgres is really a question about where sensitive values stop. The answer should be: before they ever leave the database connection. Agents are worse than humans here because they fan data out fast. A person reads a record once. An agent can

Free White Paper

AI Data Exfiltration Prevention + Data Masking (Static): The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

What happens to a Social Security number after an AI agent reads it from Postgres? It lands in a prompt, maybe a vector store, maybe a third-party model's logs, and a chain of systems you do not fully control. The question of data masking for AI agents on Postgres is really a question about where sensitive values stop. The answer should be: before they ever leave the database connection.

Agents are worse than humans here because they fan data out fast. A person reads a record once. An agent can pull a thousand rows into a context window in a second. Masking at the connection is the control that keeps regulated columns out of that flow.

What data masking means here

Data masking for an agent on Postgres means sensitive fields in query results are redacted before the agent receives them. The agent still gets a working result set, the shape and non-sensitive columns intact, but the protected values come back as redacted tokens rather than cleartext.

Why the usual approaches leak

Masked views assume every agent query hits the view and not the base table. Application-layer redaction assumes the agent goes through your app, which agents using direct database tools do not. Column-level grants stop a column from returning at all, which often breaks the task instead of protecting it. None of these sit on the path every query actually takes.

The gateway boundary

hoop.dev is an open-source Layer 7 access gateway. Its network-resident agent speaks the Postgres wire protocol, so it is on the exact path between the AI agent and the database. As Postgres streams results back, hoop.dev passes them to a configured DLP provider, Presidio or Google DLP, and redacts classified fields before the agent sees them. Because this runs on the connection, it does not depend on which query the agent wrote or whether it went through your application.

A worked example

Say an agent queries a support table:

Continue reading? Get the full guide.

AI Data Exfiltration Prevention + Data Masking (Static): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.
SELECT ticket_id, customer_email, body FROM tickets WHERE status = 'open';

With masking configured for email entities, ticket_id and body return normally while customer_email comes back redacted. The agent can triage tickets; it cannot exfiltrate the email list. The redaction happened in the result stream, not in a view you had to build and maintain.

Setting it up

The shape of the setup is short. Install the hoop.dev gateway, run its network-resident agent next to Postgres, and register the connection with its host, port, database, user, password, and SSL mode. Then configure a DLP provider for that connection and turn on data masking for the entity types you care about. From there the agent connects to the hoop.dev endpoint instead of the raw database, and redaction applies to every result it pulls. The getting started guide covers the install and first connection.

Notice what you did not do: you did not build masked views, you did not change the application, and you did not hand the agent a privileged credential. Data masking became a property of the connection rather than a project spread across schema and code.

Pitfalls

  • No DLP provider means no masking. Configure Presidio or Google DLP first.
  • Free-text columns hide PII too. Classify body-style fields, not just obvious ones like email.
  • Masking protects results, not intent. Pair it with access scoping so agents only reach tables they need.
  • Verify against representative data. A classifier tuned for one locale or format can miss values that look different in your tables.

FAQ

Is masking on by default for Postgres?

You enable it on the connection and configure a DLP provider. It is native to the Postgres proxy but requires that classifier to be set.

Can an agent bypass masking by rewriting its query?

No. Masking runs on the result stream at the gateway, so it applies regardless of how the agent phrases the query, as long as the connection goes through hoop.dev.

Does the database see masked or real data?

Postgres returns real rows. Redaction happens between the database and the agent, so the stored data is never altered and unmasked consumers on other connections are unaffected.

Can I mask some columns for one agent and not another?

Masking is configured per connection and policy, so different identities can pass through connections with different redaction rules against the same underlying tables.

hoop.dev is open source. Explore the masking pipeline and run it against your own Postgres connection from the hoop.dev repository on GitHub. For the broader model, see hoop.dev/learn.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts