Not because of bad code. Not because of flaky tests. It collapsed under a wave of role explosion in the production environment.
Role explosion is when the number of roles, permissions, and policies in your production environment multiplies so fast that management becomes impossible. What starts as a few admin and read‑only accounts turns into hundreds of complex, overlapping permission sets. Each new team member, microservice, or integration can add more. Over weeks or months, the access control layer turns brittle and dangerous.
In large‑scale systems, this problem hides in plain sight. The access architecture still works—until it doesn’t. Debugging a production outage caused by tangled roles wastes engineering time, burns trust, and delays delivery. Even when you think you’ve documented it, changes driven by urgent needs in staging or ad‑hoc fixes in production destroy the map.
At scale, small errors in access control spread into security gaps, operational slowdowns, and compliance failures. Role explosion forces engineers into reactive firefighting: tracing API calls through deeply nested permission trees, checking dozens of role definitions, and trying to predict side effects. The more distributed the system, the more dangerous every unknown permission becomes.