All posts

Data masking for AI agents on MySQL

An agent runs SELECT * FROM users WHERE status = 'churned' to summarize churn, and the result set carries every email, phone number, and billing token in the table. The query was reasonable. The data that came back was not. That gap is the control teams skip first when they wire an AI agent to a production database, and it is the one that matters most. Data masking for AI agents on MySQL is the practice of redacting sensitive columns in the result stream before any row reaches the agent. The ag

Free White Paper

AI Data Exfiltration Prevention + Data Masking (Static): The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

An agent runs SELECT * FROM users WHERE status = 'churned' to summarize churn, and the result set carries every email, phone number, and billing token in the table. The query was reasonable. The data that came back was not. That gap is the control teams skip first when they wire an AI agent to a production database, and it is the one that matters most.

Data masking for AI agents on MySQL is the practice of redacting sensitive columns in the result stream before any row reaches the agent. The agent still gets to run its query and reason over real structure and real counts. It just never sees the raw value in a column you marked sensitive. Most teams bolt on access rules and recording, then forget that the agent is reading plaintext PII on every successful query.

Why data masking is the control that gets skipped

Access control answers who can connect. Recording answers what happened. Neither stops the agent from pulling a column of social security numbers into its context window, where it may be logged, embedded, or echoed into a downstream prompt. Once a value lands in an agent's context, you have lost the ability to say where it went.

MySQL has no native concept of redacting a column for one caller and not another inside a normal connection. Views and column grants help, but they multiply with every new agent and every new use case, and they break the moment someone needs the real value. The redaction has to happen on the connection itself, between MySQL and the agent, not inside a schema you have to maintain forever.

Where the masking boundary sits

hoop.dev is an open-source Layer 7 access gateway that proxies the MySQL wire protocol. The agent connects to hoop.dev as if it were MySQL, hoop.dev forwards the query to the real database over a network-resident agent, and the result set flows back through the gateway. That return path is where masking runs. The streaming rows pass to a configured DLP provider (Microsoft Presidio or Google DLP) for classification, sensitive fields are redacted, and only the masked rows continue to the client.

Continue reading? Get the full guide.

AI Data Exfiltration Prevention + Data Masking (Static): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

One detail decides whether this works: the masking has to execute somewhere the AI agent cannot reconfigure. If redaction lives in the same process the agent drives, the agent can turn it off. Running it inside the gateway, on the connection, keeps the rule outside the agent's reach.

Set it up step by step

  1. Register the MySQL connection in hoop.dev with HOST, PORT, USER, PASS, and DB. The connection-configured database user is the identity hoop.dev uses to reach MySQL.
  2. Attach a DLP provider so the masking plugin has a classifier to call.
  3. Define the sensitive entity types you want redacted in returned data: email, phone, national ID, credit card, names.
  4. Point the agent's MySQL client at the hoop.dev endpoint instead of the database host. Nothing in the agent code changes beyond the connection string.
  5. Run a read query and confirm the flagged columns come back redacted while the rest of the row is intact.

After that, every agent on that connection inherits the same redaction. You add an agent, not a new set of database views.

A worked example

Say an agent answers support questions and needs to look up an account by ID. The query is SELECT id, plan, email, phone FROM accounts WHERE id = 4821. Without redaction, the agent receives the real email and phone and may write them into its reply, its logs, and the next prompt it builds. With masking configured for email and phone on that connection, the same query returns the plan and the ID intact while the contact fields come back redacted. The agent answers the plan question correctly and never holds the personal data. Nothing in the agent changed; the connection did the work.

Pitfalls to watch

  • Assuming masking is on everywhere by default. Support is per connection. On MySQL it runs natively, but you still configure which entities to redact. An empty policy masks nothing.
  • Masking only on read paths and ignoring writes. An agent that can write can also exfiltrate by copying sensitive columns into an unmasked table. Route risky writes for approval rather than relying on read-side masking alone.
  • Treating masking as the whole story. Pair it with scoped, just-in-time access so the agent reaches only the tables its task needs.

FAQ

Does data masking change the agent's query?

No. The query runs against real MySQL data. Redaction happens on the result stream, so counts, joins, and structure stay correct while the sensitive values are replaced.

Can the agent disable masking?

Not when it runs in the gateway. The agent connects through hoop.dev and never holds the database credential or the masking config, so it cannot turn redaction off.

Does masking work on AWS RDS MySQL?

Yes. The same wire-protocol path applies, and RDS connections can use per-user IAM auth on the web-app path for identity, with masking on the return stream.

To see how the masking plugin and the MySQL proxy fit together, read about just-in-time access for the same MySQL connection and the broader model for AI agent access. The gateway is open source. Clone it, run it against a test database, and watch a column come back redacted: hoop.dev on GitHub.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts