Your CI finished green, but the task on ECS never launched. The agent sat idle, permissions failed silently, and your logs turned into a guessing game. That’s usually the moment someone mutters, “We need to fix Buildkite ECS.”
Buildkite handles pipelines beautifully, but it doesn’t run workloads forever. Amazon ECS does that part, orchestrating containers across clusters. When you join them, you get ephemeral, isolated build runners that scale with demand and shut down when idle. It sounds perfect, but only when the identity and access rules click into place.
The Buildkite ECS integration uses AWS IAM roles to spin up containerized agents inside ECS tasks. Those agents connect back to your Buildkite pipelines, pulling jobs securely through a short-lived token exchange. The advantage is elasticity. You can run massive parallel builds during peak hours and pay for nothing when they stop.
A correct setup starts with identity. Each ECS task role must trust Buildkite’s EC2 or OIDC provider. That trust allows the Buildkite agent to request short-lived credentials without storing AWS keys anywhere. Think of it as temporary keys that vanish before anyone can screenshot them.
Permissions are the next trap. If your ECS task role is too wide, it can access shared secrets. Too narrow, and the agent cannot download artifacts. The sweet spot is a policy that grants minimal privileges per pipeline environment. Rotate these roles occasionally or automate rotation entirely. Both are easier than explaining to audit why the same role has existed since 2019.