That’s what happens when a data lake grows faster than its access control model can handle. At the start, you can track permissions in a spreadsheet or IAM config. Then teams multiply. Data domains sprawl. New services come online. The roles balloon into the tens of thousands. And one day, you realize you don’t have access control—you have role explosion.
Data lake access control at scale is one of the hardest problems to solve cleanly. Traditional role-based access control (RBAC) starts breaking down when each project, department, or dataset demands its own nuanced set of permissions. You get both over-privileged users and bottlenecked approvals. Security risks rise. Audits get messy. No one knows exactly who can see what, and your lake is no longer under control.
Large-scale role explosion happens because RBAC alone can’t adapt to the velocity and granularity of modern data lakes. Trying to manage endless roles for each scenario leads to complexity that kills both agility and security. A more sustainable approach uses attribute-based access control (ABAC) or policy-based frameworks that evaluate who is requesting access, what they are requesting, and under what context.