Just-in-time access for AI agents on EKS

Say you run an on-call agent that restarts a stuck deployment when a health check fails. Most of the day it does nothing. For that pattern, a permanent kubeconfig on the agent is the worst possible trade: it sits idle and exposed for hours so it can act for ninety seconds. Just-in-time access inverts the trade. The agent has no EKS permissions at rest and gets a bounded grant only when a real task fires.

Just-in-time access for AI agents on EKS is the model where the grant is the exception, not the baseline. This post walks the on-call restart agent end to end, because the shape of the example is what makes the model click.

The task, and where access enters it

The agent watches an alert stream. When a deployment fails its readiness probe, it needs to run kubectl rollout restart against one namespace, confirm pods come back, and stop. That is the entire blast radius it should ever have. Anything broader is standing access waiting to be abused, and the broader it is, the worse a single bad decision by the agent can be.

Why the just-in-time access grant must come from outside the agent

If the agent issues its own access, the window is whatever the agent decides, which is not a control at all. For access to be just-in-time, an authority the agent cannot reconfigure has to grant it and an automatic expiry has to revoke it. hoop.dev is an open-source Layer 7 access gateway that provides exactly that authority. Its kubernetes-eks connector proxies kubectl to the cluster through a network-resident agent that assumes a configured IAM role, the EKS_ROLE_ARN, mapped to a narrow Kubernetes RBAC role.

Continue reading? Get the full guide.

Just-in-Time Access + AI Human-in-the-Loop Oversight: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Walking the example

Create an RBAC role that can only restart deployments in the payments namespace, and map the EKS connection's IAM role to it.
In hoop.dev, set the connection to grant just-in-time access only, with a 15 minute window and approval required for production.
The alert fires. The on-call agent requests access on the EKS connection through the gateway.
An approver confirms, or an auto-approve policy for this narrow action lets it through. The gateway opens a 15 minute session.
The agent runs kubectl rollout restart deployment/checkout -n payments, watches the rollout, and finishes. The session lapses on its own.

request:  agent-oncall -> connection prod-eks
scope:    rollout restart, namespace payments only
window:   15m, auto-revoke
result:   session recorded, grant gone at 15:00

What you get from this shape

The agent is harmless at rest. A leaked agent credential grants nothing on the cluster on its own.
The grant is scoped to one action in one namespace, so even an active session has a tiny blast radius.
The window closes itself, so there is no cleanup step to forget and no orphaned binding to find in next quarter's audit.

Pitfalls

Do not widen the RBAC role "to be safe." A narrow role is the point; width quietly undoes just-in-time access.
Do not auto-approve broad actions. Reserve auto-approve for the narrow, well-understood ones and route the rest to a human.
Do not stretch the window because requesting again is mildly annoying. The friction is doing its job.

The argument

Standing access optimizes for the agent's convenience. Just-in-time access optimizes for the window of exposure, and that is the right thing to optimize when the actor never sleeps. hoop.dev makes EKS run on bounded, approved windows. Start with the getting started guide and read more on scoped, time-bound access.

FAQ

How short should the window be? As short as the task. A restart needs minutes, not hours.

What happens at expiry? The gateway revokes the session against the EKS connection automatically. No manual teardown.

Does the agent hold the IAM role? No. The role lives on the connection and is assumed by the gateway agent; the AI agent only requests access.

hoop.dev is open source and MIT licensed. Grab it at github.com/hoophq/hoop and give your agents just-in-time access instead of a key they keep forever.