Sensitive columns hide in plain sight. They live in customer tables, payment records, medical notes, usage logs, and analytics dashboards. They hold data that, if exposed, turns into regulatory violations, fines, and reputational damage. Most teams can list the usual suspects: passwords, credit card numbers, social security IDs. But real risk lives in the overlooked — timestamp patterns that reveal behavior, location trails buried in metadata, or fields storing notes that contain personal identifiers.
Discoverability of sensitive columns is the hard part. Databases grow without a central map. New tables appear from quick features. Columns pile on after migrations. Naming conventions get sloppy. Sensitive data moves between systems unnoticed. By the time someone asks, “Where is all our PII?” the answer is buried under millions of rows across dozens of services.
The first step is scanning. Every schema, every table, every column. But scanning alone isn’t enough. You need classification that understands types, formats, and usage. You need context. A column called “id” could be harmless or could hold a government-issued ID. A “notes” field could contain public text or private health details. Algorithms help, but human review is essential for edge cases.