DEV Community

Matt
Matt

Posted on • Originally published at fortem.dev

How to Debug AWS Fargate Containers with ECS Exec

You moved to Fargate. No more SSH. No more docker exec. Your container is failing and you can't get inside.

ECS Exec — AWS's answer to docker exec for Fargate — has been around since 2021. It bind-mounts the SSM agent into your running container at runtime. No sidecar. No ports. No keys. Just IAM.

This guide covers setup, the 5 errors that catch everyone, and the production controls you actually need.


Why ECS Exec exists

Fargate has no hosts to SSH into. Before ECS Exec launched in March 2021, debugging a Fargate container meant you couldn't get a shell at all. It was the #1 most requested feature on the AWS Containers Roadmap.

ECS on EC2 (before) ECS on Fargate (with ECS Exec)
SSH into EC2 instance aws ecs execute-command (no SSH)
docker exec -it container bash /bin/bash via SSM
Open ports, manage SSH keys No ports, no keys — IAM controls access
Locate instance in ASG first Direct to task ID — always routable

Key fact: ECS Exec is not a sidecar. It bind-mounts the SSM agent binaries into your existing container at runtime. Your task definition doesn't change.


Download the skill file first

Before you hit one of the 5 errors below — there's a skill file on fortem.dev that an AI agent (Claude Code, OpenCode, Codex) can run for you.

It checks:

  • Whether --enable-execute-command is set on your service
  • Whether the task role has the right SSM permissions
  • Whether the Session Manager plugin is installed locally
  • Network path to SSM endpoints
  • Read-only filesystem settings

Get the ECS Exec Readiness skill file → fortem.dev/blog/ecs-exec-guide

Drop the .md file into your AI agent and it runs the 5-point checklist against your AWS account. Everything runs locally, read-only by default.


The 5 errors that catch everyone

01 — ExecuteCommandAgent not RUNNING

Cause: You forgot --enable-execute-command when creating or updating the service.

aws ecs update-service \
    --cluster your-cluster \
    --service your-service \
    --enable-execute-command \
    --force-new-deployment
Enter fullscreen mode Exit fullscreen mode

Verify: aws ecs describe-tasks — check enableExecuteCommand: true and ExecuteCommandAgent status: RUNNING


02 — AccessDeniedException — User is not authorized

Cause: Your task IAM role doesn't have SSM permissions. This is the #1 cause of silent failures.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "ssmmessages:CreateControlChannel",
      "ssmmessages:CreateDataChannel",
      "ssmmessages:OpenControlChannel",
      "ssmmessages:OpenDataChannel"
    ],
    "Resource": "*"
  }]
}
Enter fullscreen mode Exit fullscreen mode

Attach to the task role, not the execution role. The SSM agent runs inside the container — it's the task that needs the permissions.


03 — TargetNotConnected — Session Manager plugin not found

Cause: The SSM Session Manager plugin is not installed on your local machine.

# macOS
curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/mac/sessionmanager-bundle.zip" -o "session.zip"
unzip session.zip
sudo ./sessionmanager-bundle/install -i /usr/local/sessionmanagerplugin -b /usr/local/bin/session-manager-plugin

# Verify
session-manager-plugin --version
Enter fullscreen mode Exit fullscreen mode

04 — Timeout, session never connects

Cause: Your Fargate task can't reach the SSM service endpoint. Either no NAT gateway in the private subnet, or missing VPC endpoints.

# Create VPC endpoint for SSM (recommended for private subnets)
aws ec2 create-vpc-endpoint \
    --vpc-id vpc-xxx \
    --service-name com.amazonaws.region.ssmmessages \
    --subnet-ids subnet-xxx
Enter fullscreen mode Exit fullscreen mode

05 — Session starts but commands fail — cannot create directory

Cause: Your container has readonlyRootFilesystem: true. The SSM agent writes to /var/lib/amazon/ssm/ — it needs a writable filesystem.

"linuxParameters": {
  "initProcessEnabled": true
}
Enter fullscreen mode Exit fullscreen mode

And set readonlyRootFilesystem: false. There's no workaround — the agent needs writable storage.


The happy path — step by step

Step 1 — Install Session Manager plugin (see error 03 above)

Step 2 — Task IAM role policy (see error 02 above — attach ssmmessages:* to task role)

Step 3 — Enable on service:

aws ecs update-service \
    --cluster my-cluster \
    --service my-service \
    --enable-execute-command \
    --force-new-deployment
Enter fullscreen mode Exit fullscreen mode

Step 4 — Verify:

aws ecs describe-tasks \
    --cluster my-cluster \
    --tasks $(aws ecs list-tasks --cluster my-cluster --service my-service --query 'taskArns[0]' --output text)
# Look for: "enableExecuteCommand": true, ExecuteCommandAgent "lastStatus": "RUNNING"
Enter fullscreen mode Exit fullscreen mode

Step 5 — Execute:

# Interactive shell
aws ecs execute-command \
    --cluster my-cluster \
    --task YOUR_TASK_ID \
    --container nginx \
    --command "/bin/bash" \
    --interactive

# Single command
aws ecs execute-command \
    --cluster my-cluster \
    --task YOUR_TASK_ID \
    --container nginx \
    --command "env | grep DATABASE" \
    --interactive
Enter fullscreen mode Exit fullscreen mode

Production setup — logging, audit, access control

Three layers for production:

Layer 1 — Log command output

aws ecs update-cluster \
    --cluster my-cluster \
    --configuration executeCommandConfiguration='{
      "logging": "OVERRIDE",
      "logConfiguration": {
        "cloudWatchLogGroupName": "/aws/ecs/my-cluster-exec",
        "s3BucketName": "my-exec-logs",
        "s3KeyPrefix": "exec-output"
      }
    }'
Enter fullscreen mode Exit fullscreen mode

CloudTrail logs who ran ExecuteCommand. S3/CloudWatch logs what they ran.

Layer 2 — Restrict by environment tag

{
  "Effect": "Allow",
  "Action": "ecs:ExecuteCommand",
  "Resource": [
    "arn:aws:ecs:us-east-1:123456789:cluster/my-cluster",
    "arn:aws:ecs:us-east-1:123456789:task/my-cluster/*"
  ],
  "Condition": {
    "StringEquals": {
      "ecs:ResourceTag/environment": "development"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Layer 3 — Block production by container name

{
  "Effect": "Deny",
  "Action": "ecs:ExecuteCommand",
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "ecs:container-name": "production-app"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

What ECS Exec can't do

Limitation Why it matters
20-minute idle timeout Not configurable. Active commands keep it alive
1 session per PID namespace Second session fails until first exits
Must be enabled at launch Can't retroactively enable on running tasks
Read-only root FS breaks it SSM writes to /var/lib/amazon/ssm/
Commands run as root Ignores container USER directive
No AWS Console support CLI/SDK only
Only tools in the image No injected debug tools

"ECS Exec sessions drop after 20 minutes of idle time — this timeout is not configurable. Only one session per container PID namespace is supported, and sessions always run as root regardless of the container USER directive." — AWS ECS Exec documentation, verified June 2026


FAQ

Does ECS Exec work on Fargate Spot?
Yes. The Spot interruption risk means you might lose your exec session mid-debug, but the feature works identically on Spot and On-Demand.

How much does ECS Exec cost?
ECS Exec itself is free. The only potential cost is CloudWatch Logs or S3 storage if you enable session logging. SSM Session Manager is also free. KMS key usage for encryption costs ~$1/month per key.

Can I use ECS Exec to run a one-off command?
Yes: aws ecs execute-command --command 'ls -la' --interactive. For non-interactive use (from CI/CD), omit --interactive.

How do I restrict ECS Exec to specific IAM users?
Use IAM condition keys on ecs:ExecuteCommand. Restrict by cluster name, task tags, container name. Example: allow exec only on tasks tagged environment=development.

What happens to ECS Exec when I update the service?
If you update without --enable-execute-command, new tasks will NOT have ECS Exec. Always include the flag in your update-service calls, or manage it via IaC.


Full article with downloadable skill file: fortem.dev/blog/ecs-exec-guide

Top comments (0)