All posts

Production access control for autonomous agents on EKS

Giving an autonomous build or scaling agent unrestricted kube‑config access to a production EKS cluster is a recipe for runaway privilege escalation. Implementing production access control for such agents is essential. In many organizations the easiest way to let a CI/CD pipeline, auto‑scaler, or self‑healing service interact with Kubernetes is to drop a static service‑account token or a long‑lived kubeconfig file onto the host that runs the agent. The token is often granted cluster‑admin or a

Free White Paper

EKS Access Management + Customer Support Access to Production: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

Giving an autonomous build or scaling agent unrestricted kube‑config access to a production EKS cluster is a recipe for runaway privilege escalation. Implementing production access control for such agents is essential.

In many organizations the easiest way to let a CI/CD pipeline, auto‑scaler, or self‑healing service interact with Kubernetes is to drop a static service‑account token or a long‑lived kubeconfig file onto the host that runs the agent. The token is often granted cluster‑admin or a broadly scoped role because engineering teams want to avoid the friction of fine‑grained RBAC. The result is a single credential that can create, delete, or modify any workload, and the same credential is reused across dozens of jobs, environments, and even across multiple clusters. Because the token is static, it lives on disk, it is copied into container images, and it is sometimes checked into source control by accident. No central audit captures what each autonomous job actually did, and no inline guard stops a mis‑behaving script from issuing a destructive command.

What teams really need is production access control that limits each agent to the exact actions required for its purpose, grants that permission only for the short window when the job runs, and records every API call for later review. The missing piece is a control surface that sits between the agent and the Kubernetes API, because without a gateway the request still travels directly to the cluster. Direct traffic bypasses any opportunity for just‑in‑time approval, command‑level audit, or real‑time response masking.

Why production access control matters on EKS

Production workloads are the most valuable assets in a cloud‑native environment. A single stray kubectl delete pod or an unintended helm upgrade can cascade into a service outage that affects thousands of users. Autonomous agents amplify that risk because they run without human supervision, often scaling up and down dozens of times per hour. When an agent is granted a static, high‑privilege token, the blast radius of a compromised credential expands dramatically. Attackers who exfiltrate the token can pivot to any namespace, read secrets, and even modify IAM bindings that affect the entire account.

Production access control addresses three core concerns: least‑privilege entitlement, time‑bounded access, and verifiable evidence of what was done. Least‑privilege ensures the agent can only invoke the API verbs it needs, for example list and create on a specific deployment. Time‑bounding guarantees that the permission exists only while the job is active, closing the window for misuse. Verifiable evidence means that every request and response is captured, so auditors and incident responders can replay the exact sequence of actions.

How a gateway can enforce control

The only place to enforce these controls is on the data path that carries the API traffic. The gateway becomes the single point where identity, policy, and enforcement intersect. It inspects each HTTP request to the Kubernetes API server, matches the caller’s identity against a policy that defines allowed verbs, resources, and namespaces, and decides whether to allow, mask, or require manual approval before forwarding the request.

Continue reading? Get the full guide.

EKS Access Management + Customer Support Access to Production: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

When the gateway sees a request that matches a policy requiring approval, it pauses the flow and routes the operation to a human reviewer. If the reviewer approves, the gateway forwards the request; otherwise it returns a denial. For operations that are allowed, the gateway can mask sensitive fields in the response, such as secret data or token values, before they reach the agent. Every request and response is recorded in a session log that can be replayed later for audit or forensic analysis. Because the gateway sits in the data path, none of these enforcement outcomes are possible if the agent connects directly to the cluster.

Implementing the pattern with hoop.dev

hoop.dev provides exactly the gateway described above for EKS clusters. It runs a network‑resident agent that assumes a dedicated IAM role configured for EKS authentication. That role is mapped to a Kubernetes RBAC binding, so the agent presents a short‑lived identity that the cluster trusts. The gateway then proxies every kubectl or API request through its Layer 7 inspection layer.

hoop.dev records each session, capturing the full command history and API payloads. It can mask fields like metadata.annotations that contain secrets, ensuring that even a compromised agent never sees raw credential values. When a policy requires human sign‑off, such as a rollout to a production namespace, the gateway pauses the request and presents it to an approver via the built‑in workflow UI. Only after approval does hoop.dev forward the request to the EKS control plane.

Because the enforcement logic lives in hoop.dev, the agent never sees the underlying service account token or the IAM credentials used to talk to the cluster. The agent only presents the user’s OIDC token, which hoop.dev validates against the organization’s identity provider. This separation of identity (setup) from enforcement (data path) satisfies the attribution model: setup decides who can start a session, the gateway is the only place enforcement happens, and the audit, masking, and approval outcomes exist solely because hoop.dev sits in the data path.

To try this approach, start with the getting‑started guide that walks through deploying the gateway, configuring the EKS IAM role, and defining a production access control policy. The full feature reference is available on the learn page, where you can explore policy syntax, session replay, and masking options. For the actual implementation details, including the Helm chart and IAM role definitions, see the open‑source repository on GitHub: https://github.com/hoophq/hoop.

FAQ

Does hoop.dev replace Kubernetes RBAC?

No. hoop.dev works alongside RBAC. It assumes a role that maps to a Kubernetes binding, then adds an additional layer of policy that can enforce time‑bounded access, approval workflows, and response masking.

Can I use hoop.dev with existing CI pipelines?

Yes. The pipeline only needs to present an OIDC token that the gateway trusts. The gateway then handles credential injection, so the pipeline never stores static kubeconfig files.

What evidence does hoop.dev generate for auditors?

hoop.dev generates a complete session log for every connection, including request metadata, masked responses, and any approval decisions. Those logs can be exported for SOC 2, ISO 27001, or internal compliance reviews.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts