PII leakage prevention in Databricks starts with knowing exactly where sensitive data lives. Use Unity Catalog to classify tables and columns. Tag PII fields at creation, not after a breach. Make data lineage visible. Audit and verify those tags regularly so nothing slips past review.
Access control is your next barrier. Assign permissions by job function, not individual whim. Apply the principle of least privilege: no user should see more than they need to perform their work. Use Unity Catalog’s fine-grained access rules to limit read, write, and manage rights at the schema, table, and column level. Avoid wide-open clusters or shared credentials—these destroy accountability.
Monitor, measure, and enforce. Enable audit logging across Databricks workspaces. Review logs for unusual query patterns or access from unknown locations. Set real-time alerts when protected datasets are read outside normal business hours. Combine these with automated policy enforcement so violations result in immediate session termination or credential revocation.