I once opened a commit history and saw a phone number staring back at me.
It shouldn’t have been there. It was buried deep in an old branch, committed years ago, but Git never forgets. And neither will anyone who clones that repo. That’s when you realize: rebase isn’t just for cleaning commit messages—it’s for erasing sensitive data, for good.
Git Rebase for PII Anonymization is not about hiding mistakes in shallow logs. It’s about removing Personal Identifiable Information—names, emails, addresses, credit card numbers—from the DNA of your repository. Once that data is in Git, it lives across clones, forks, and mirrors. Simple file deletes won’t stop it. You need to rewrite history.
The process starts with identifying the exposed PII. Scan every commit. Automate if you can—regex, scripts, detection tooling. Make a complete list of every place the sensitive data appears. Your success depends on this inventory being accurate.
Next, create a fresh branch from the point before the leak. Use git rebase -i or git filter-repo to surgically edit or remove the offending commits. Replace the PII with anonymized tokens or realistic dummy data. Do not simply strip fields if it breaks the code—make it run without the real values. Tests should pass after anonymization.