Production access control for AI agents on Kubernetes

Before an AI agent touches a production Kubernetes cluster, decide where the production access control lives, because the wrong answer is "inside the agent." An agent holding a production kubeconfig is a single process with standing access to your most important workloads and the ability to log its own behavior. This is a setup guide for the other model: production access control enforced at a boundary the agent connects through, so access is brokered per task and recorded where the agent cannot reach it.

What production access control means for an agent

Production access control is the set of rules that decide whether a given identity may run a given command against a given production resource, right now, and that record what happened. For an agent on Kubernetes the resources are the API server, pods, secrets, and deployments. The control has to do four things at once: tie access to the agent's identity, scope it to the task, broker it per session instead of leaving it standing, and record the session outside the agent. A static kubeconfig does none of these.

Setup, step by step

Put a boundary in front of the cluster. Stop giving agents direct kubeconfigs. Place an access boundary in front of the Kubernetes API and route agent traffic through it. The agent connects to the boundary; the boundary connects to the cluster. hoop.dev is built for this: it is a Layer 7 access gateway and identity-aware proxy that sits in front of the cluster's access path.
Give each agent its own identity. Register a distinct identity per agent at the boundary, not a shared service account. Production access control depends on knowing exactly which agent is asking.
Scope access to the task, not the cluster. Define the narrowest set of verbs and resources each agent's tasks require, scoped to a namespace where possible. The grant should cover the task and stop there.
Broker access per task instead of standing. Configure access to be granted on request, for the task, and to end on its own. Between tasks the agent holds nothing. There is no resting credential to steal.
Gate the risky verbs. Route high-risk actions, namespace deletes, production exec, secret reads, through an approval before they proceed, while low-risk reads pass automatically.
Record every session outside the agent. Because traffic runs through the gateway, the full command sequence is captured on the gateway side, outside the agent process, and stored where the agent has no write path.

The getting-started docs for connecting a Kubernetes cluster walk through registering the connection, and the learn pages on per-task access and recording cover scoping and approvals in depth.

The verification step, which is where most setups are actually tested

A configuration you have not tried to break is a configuration you do not understand. Run three checks.

Continue reading? Get the full guide.

AI Model Access Control + Kubernetes API Server Access: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Allowed path. Run a real task through the boundary, then pull the session recording and confirm it shows the full command sequence tied to the agent identity, not a single summarized line.
Denied path. Have the agent attempt a command outside its scope, deleting a resource in a namespace it was not granted, and confirm two things: the action is denied, and the attempt is still recorded.
No resting access. Between tasks, confirm the agent cannot open a connection to the cluster on its own. If it can, you still have standing access and the per-task control is not really in force.

If all three hold, your production access control is enforced at the boundary, not assumed inside the agent.

The architectural reason this order matters

The steps are not interchangeable. The boundary comes first because everything else depends on access running through a layer the agent cannot reconfigure. Identity, scope, approval, and recording are all properties of that one access path, not separate systems you wire together and hope stay in sync. Production access control is trustworthy exactly to the degree that the agent cannot grant itself access or edit the record of what it did. Putting the boundary outside the agent is what makes the other five steps mean something.

Pitfalls

Boundary in place, kubeconfig still issued. If the agent keeps a direct credential as a fallback, it can route around the boundary and the control is optional. Remove the standing credential.
Recording the agent writes itself. A session log produced inside the agent shares the agent's trust boundary and fails when the agent is compromised. Record at the gateway.

FAQ

Does this replace Kubernetes RBAC?

No, it sits in front of it. RBAC still governs what a credential can do inside the cluster. Production access control at the boundary adds per-task brokering, agent identity, approvals, and recording outside the agent, which RBAC alone does not provide.

What is the first thing to set up?

The boundary. Until agent traffic runs through a layer the agent does not control, identity, scope, and recording all sit inside the agent's reach, and the rest of the control cannot be trusted.

hoop.dev is open source. To set up production access control in front of your Kubernetes cluster, start from the repository on GitHub.