Why are my Amazon ECS container instances with Amazon Linux 1 AMIs disconnected?

5 minute read
0

My container instances for Amazon Elastic Container Service (Amazon ECS) are disconnected.

Short description

Your Amazon ECS container agent might connect and reconnect several times an hour. These change events are normal and aren't a cause for concern.

However, if your container agent remains in a disconnected state, then the container instance can't operate as part of your ECS cluster. Your agent is disconnected when agentConnected returns false. The issue can be caused by the following:

  • Networking issues prevent communication between the instance and Amazon ECS.
  • The container agent doesn't have the required AWS Identity and Access Management (IAM) permissions to communicate with Amazon ECS endpoints.
  • There are problems with the host or Docker service inside the container instance.

To identify the cause of the disconnection, complete the following steps.

Resolution

Note: The following resolution applies to Amazon ECS-optimized Amazon Linux 1 AMIs. For a resolution that applies to Amazon ECS-optimized Amazon Linux 2 AMIs, see How do I troubleshoot a disconnected Amazon ECS agent?

Verify that the Docker service is running on the container instance

1.    To verify that the Docker service is running on the affected container instance, run the following command:

sudo service docker status

The command output is similar to the following:

docker (pid 23013) is running...

If the Docker service isn't running, or if you need to restart the service, run the following command:

sudo service docker restart

Note: Don't enter this command while the service is already running. First, make sure to set the container instance to the draining state. Then, restart the Docker service for existing tasks to be scheduled on another container instance.

The command output must include the following lines:

Stopping docker: [  OK  ]
Starting docker: [  OK  ]

Note: To verify that the Docker service is running after the restart command, run the sudo service docker status command.

2.    To start the ECS agent, run the following command:

sudo start ecs

Verify that the container agent is running on the container instance

To verify that the container agent is running on the affected container instance, run the following command:

sudo status ecs

If the container agent isn't running on your container instance, then run the following command to start the agent:

sudo start ecs

The command output is similar to the following:

ecs start/running, process 23403

Review log files for the container agent and Docker

If your container instances are still disconnected, review the log files on the container host for the container agent and Docker.

To output the log files for the container agent and Docker, run the following commands:

sudo cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-**
sudo cat /var/log/docker

Note: To collect log information from the container instance, run the Amazon ECS logs collector.

Verify that the IAM instance profile has the necessary permissions

If the container agent is still disconnected, verify that the IAM instance profile associated with the container instance has the necessary IAM permissions.

1.    Connect to the instance using SSH.

2.    To view the instance metadata on the instance profile associated with the instance, run the following command:

curl http://169.254.169.254/latest/meta-data/iam/info

The command output is similar to the following:

{
  "Code" : "Success",
  "LastUpdated" : "2019-06-29T15:47:03Z",
  "InstanceProfileArn" : "arn:aws:iam::1122334455:instance-profile/ecsInstanceRole",
  "InstanceProfileId" : "AIPAJ5WF3LZVY7PLUHV72"
}

3.    Verify that the IAM role contains the correct permissions for your container instances.

4.    To verify specific credential errors with the container agent, run the following command to check the container agent log for a list of ECS logs:

cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-**

Note: The container agent log is rotated every hour, and the suffix automatically changes to reflect the current date and time. Update the command to include the date range and log ID for when the issue occurred.

If the container agent doesn't have the necessary credentials, you receive an error similar to the following in the logs:

2019-06-29T16:10:09Z [ERROR] Unable to register as a container instance with ECS: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster
    status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f
2019-06-29T16:10:09Z [ERROR] Error re-registering: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster
    status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f

Additional help

If you can't identify the issue with your ECS container instance from this resolution, contact Premium Support for help. First, use Amazon ECS logs collector to create an archive of your instance's logs. Then, attach the logs to a support ticket to help the support engineer troubleshoot the issue.


Related information

Amazon ECS troubleshooting

Amazon ECS container agent

Amazon ECS container instance IAM role

Amazon ECS log file locations

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago