A tokenized column did exactly what it was designed to do, and an agent still walked away with every customer's real email. The data was tokenized at rest. But the agent had a legitimate query path, the application detokenized on read the way it always does, and the cleartext flowed straight into the model's context. The vault worked. The boundary it was protecting was never the one the agent crossed.
That is the failure that separates data masking from tokenization. They sound interchangeable and they are not. One protects stored values. The other controls what a caller is allowed to see in a response. For an agent that reaches data through a live query, only one of them stands in the path that matters.
What tokenization does
Tokenization replaces a sensitive value with a surrogate and keeps the mapping in a separate, guarded store. The real value lives in the token vault; the database holds tokens. It is a strong control for data at rest and for shrinking the systems that ever touch raw values. If a tokenized table leaks, the attacker gets tokens, not card numbers.
The limit is detokenization. Tokenization is built to be reversible for authorized callers, because the application needs the real value to do its job. The moment a caller is authorized to detokenize, tokenization steps out of the way and hands back cleartext. An agent on an authorized path is, by definition, an authorized caller. The token model was never designed to decide that this particular caller, on this particular query, should see masked output instead of real values.
What data masking does
Data masking transforms sensitive values in the response itself: the SSN comes back as XXX-XX-1234, the email as a redacted form, the PAN as its last four digits. It operates at the point of access, on the result a caller receives, deciding per request what that caller is allowed to see. The underlying row is untouched; what changes is the view.
That is precisely the control the tokenization story was missing. Masking does not ask whether the data is encrypted at rest. It asks whether this caller, on this query, should see the real value, and it redacts before the response leaves the boundary. For an agent, that is the difference between "the data was protected in the database" and "the agent never received the cleartext."
They solve different problems
This is not a contest with one winner. Tokenization minimizes where raw values live and protects data at rest. Data masking controls what a live caller sees at query time. A mature setup uses both: tokenize to shrink the blast radius of a stored-data breach, and mask so that authorized-but-untrusted callers, agents very much included, get redacted output instead of the real thing.
