The query returned rows you should never have seen. Names, emails, IDs. PII scattered in plain view. In Databricks, this is the breach point—where compliance fails and trust evaporates.
A PII Catalog in Databricks is not just a feature. It’s the map of every sensitive field across every table, schema, and workspace. Building it starts with precise metadata scanning. You identify columns with personally identifiable information using automated classification. Tag them with standard labels—name, address, SSN, email. Store those tags in Unity Catalog or your metadata layer so every engineer, analyst, and pipeline knows where the risks live.
Once the PII catalog exists, data masking becomes the weapon. Databricks supports column-level security and dynamic views that can replace sensitive fields with nulls, hashes, or obfuscated tokens. Masking rules should be role-based: authorized users see the raw value, everyone else sees a masked version. This keeps pipelines intact while staying compliant with GDPR, CCPA, and internal policies.