What was once a neat, predictable set of configurations became an unmanageable swarm. Infrastructure as Code promised order. At large scale, it often delivers chaos instead, unless you see the patterns early and act fast. The phenomenon is real: large-scale role explosion. It slows deployments, clogs reviews, and turns onboarding into archaeology.
Role explosion happens when each team, project, or environment spawns a slightly different variant of the same privilege set. Add automation and self-service, and the duplicates multiply. Soon, even reading the code feels like wandering in a maze. You have staging roles, CI/CD roles, testing roles, ephemeral roles born from temporary needs that somehow never get cleaned up. Multiply that by dozens of services across hundreds of stacks, and you have cost, complexity, and risk on autopilot.
The root issues hide in IaC templates, policy definitions, and provisioning scripts. When teams define roles locally instead of centrally, conflicts and drift follow. Resource constraints, quick fixes, and "just this once"workarounds turn infrastructure into a patchwork. Over time, this erodes visibility, slows audits, and exposes sensitive systems to misconfigurations.