It happened in the middle of a frantic bug fix. A branch had gone sideways, a dataset looked wrong, and before I even thought through the consequences, my fingers typed a git reset command that cost us hours of forensic work. That mistake taught me more about git reset with sensitive data than all the docs combined.
git reset is a sharp tool. For source code, it can be a lifesaver. For a PII catalog—names, emails, phone numbers, IDs—it can be a grenade. It doesn’t just change commits; it rewrites history. If that history contains regulated data, you’re now juggling compliance risk, legal exposure, and audit nightmares.
The danger comes from the silent nature of git’s history edits. git reset --hard doesn’t just change your local working directory—it re-points HEAD, shuffles commit ancestry, and can make sensitive data look “gone” when it’s still there in the object database, waiting for someone with the right command to find it.
If you’re maintaining a PII catalog inside a repo—whether for mapping, schema, validation, or ETL pipelines—once that data touches git, it’s baked into the repository’s history unless you deliberately scrub it. And scrubbing it is a whole different workflow. Tools like git filter-repo and BFG Repo-Cleaner are designed for this job, but they come with trade-offs: large repo rewrites, broken clone histories, and the need for every contributor to re-sync.