Why can't I connect to my ElastiCache for Redis cluster?

8 minute read
0

I can't connect to my Amazon ElastiCache for Redis cluster. How can I troubleshoot this?

Short description

Connectivity issues can have multiple root causes. The most common issues are:

  • The cluster isn't ready.
  • The cluster is unhealthy.
  • The network configuration is incorrect.
  • The client configuration is incorrect.

Resolution

Verify that the cluster is ready

If you recently created the cluster, verify that the cluster creation completed and that the cluster is ready to accept connections.

Check the status of the cluster using the ElastiCache console, the AWS Command Line Interface (AWS CLI), or the ElastiCache API. Review the Status column for the following:

  • If the Status column shows Available, the cluster is ready.
  • If the Status column shows Creating, then cluster creation is still on-going. Wait a few minutes until it updates to Available.
  • If the Status column shows Modifying, the cluster's configuration is updating. Wait a few minutes until the modifications finish and the status changes to Available.

Verify that the cluster is healthy

In a healthy cluster, each individual node should be in the Available state. To verify the cluster's health, do the following:

Verify network-level connectivity between the cluster and the client resource

To minimize latency, access ElastiCache from Amazon Elastic Compute Cloud (Amazon EC2) instances. Accessing Amazon ElastiCache from other resources within the same Amazon Virtual Private Cloud (Amazon VPC) also helps minimize latency. However, it's possible to connect from outside of the VPC, or even outside of AWS.

For more information on connecting to ElastiCache, see the following:

To automate the connection process, use the VPC Network Access Analyzer service to troubleshoot connectivity issues between AWS resources.

Verify that security groups and network ACLs allow connections

Perform this step on the ElastiCache cluster and on the resource that's initiating the connection. Examples of client resources are:

  • An Amazon EC2 instance.
  • An AWS Lambda function.
  • An Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS) container, and so on.

In ElastiCache, make sure that the security groups are configured correctly. For other resources, verify security groups and network ACLs.

Confirm the security group on the ElastiCache cluster

1.    Select the cluster name from the Redis clusters menu, and then select the Network and security tab.

2.    Verify that at least one of the associated security groups allows inbound connections from the client resource to the cluster on the cluster's port.

3.    To confirm the port number, check any of the endpoints of the cluster. The endpoints are in the domain_name:port format.

Note: The cluster's port is 6379/TCP by default. You can override the port number during cluster creation.

4.    On the client resource, verify that the security groups allow outbound connections to the cluster's port and to the CIDR blocks of the cluster's subnets.

5.    Verify that the network ACLs allow outgoing and incoming connections between the client and the cluster. The default Network ACLs usually allow all connections.

Note: You can use the VPC Network Access Analyzer service to troubleshoot security group and network ACL configurations.

Identify the correct endpoint for connections

The recommended connection endpoints differ between cluster configurations. For more information on finding the correct endpoints and possible configurations, see Finding connection endpoints.

Verify that DNS resolution works on the client side

DNS issues are commonly identified by the Name or service not known and NXDOMAIN error messages.

$ nslookup nonexistent.1234id.clustercfg.euw1.cache.amazonaws.com
Server:         172.31.0.2
Address:        172.31.0.2#53

** server can't find nonexistent.1234id.clustercfg.euw1.cache.amazonaws.com: NXDOMAIN
$ redis-cli -h nonexistent.1234id.clustercfg.euw1.cache.amazonaws.com
Could not connect to Redis at nonexistent.1234id.clustercfg.euw1.cache.amazonaws.com:6379: Name or service not known

If you see the preceding errors, check the DNS attributes of the VPC client resource.

It's a best practice to use the Amazon DNS server, if possible. For more information, see Amazon DNS server.

Verify TCP connectivity between the client and Redis

Use the curl or telnet command to establish a TCP connection and verify connectivity:

$ curl -v telnet://test.1234id.clustercfg.euw1.cache.amazonaws.com:6379
*  Trying 172.31.1.242:6379...
* Connected to test.1234id.clustercfg.euw1.cache.amazonaws.com (172.31.1.242) port 6379 (#0)

In the preceding example, the Connected keyword shows that the TCP connection works.

If Connected doesn't appear in the command results, check the following:

On the ElastiCache cluster

  • Security Groups must allow connections on the cluster's port. Verify the port value on the cluster configuration page (default is TCP/6379). For more information, see Modifying an ElastiCache cluster.
  • The cluster and all of its shards and nodes must be in the available state. For more information, see Viewing a cluster's details.

On the client resource

  • Security groups must allow outgoing connections to the cluster's IP and port.
  • Routing tables must have the appropriate routes so that the cluster is reachable.
  • The resource can be in the same VPC. Or, if the resource is in another VPC or outside of AWS, make sure that it has the appropriate connection configured. This might be VPN or VPC-Peering, DirectLink, and so on. For more information, see Accessing your cluster or replication group.
    Note: Amazon ElastiCache is designed to be accessed from the same VPC to ensure low latency. Connections outside of the VPC introduce extra latency. This extra latency is especially common with connections using the public Internet either directly or through tunneling. Because Redis is very latency-sensitive, the extra latency might cause connectivity and time-out issues.

The VPC Reachability Analyzer is a tool to help determine what is blocking access.

Troubleshoot connecting to clusters with in-transit encryption

In-transit encryption works by sending Redis traffic over TLS. The client must have TLS support for the connection to work.

$ redis-cli -h encrypted.1234id.clustercfg.euw1.cache.amazonaws.com

If you have redis-cli installed and it has TLS support, then add the --tls argument to the command:

$ redis-cli -h encrypted.1234id.clustercfg.euw1.cache.amazonaws.com --tls
encrypted.1234id.clustercfg.euw1.cache.amazonaws.com:6379>

If redis-cli was compiled without TLS support, the following error displays:

$ redis-cli -h encrypted.1234id.clustercfg.euw1.cache.amazonaws.com --tls
Unrecognized option or bad number of args for: '--tls'

To troubleshoot the preceding error, do one of the following:

It's a best practice to compile redis-cli with TLS support if you will use redis-cli later. For steps for Amazon Linux 2 and Amazon Linux, see Download and install redis-cli in Step 4: Connect to the cluster's node.

-or-

Use an alternate command, such as openssl. The openssl command is available on most systems and is useful if a redis-cli with TLS support isn't available. The following is an example of the openssl command:

$ openssl s_client -connect encrypted.1234id.clustercfg.euw1.cache.amazonaws.com:6379
CONNECTED(00000003)
----- omitted --------
INFO
# Server
redis_version:6.2.6
----- omitted -----

For more information, see the Connecting to an Encryption/Authentication enabled cluster section in Step 4: Connect to the cluster's node.

Troubleshoot connecting to clusters with authentication

redis-cli

All clusters with authentication require TLS. The redis-cli command requires both the --tls and --askpass (or -a) arguments.

If the --askpass argument isn't provided, then you receive the following output:

$ redis-cli -h auth-cluster.1234id.clustercfg.euw1.cache.amazonaws.com --tls
auth-cluster.1234id.clustercfg.euw1.cache.amazonaws.com:6379> INFO # or any other Redis command
NOAUTH Authentication required.

If the command contains an incorrect password, then you receive the following output:

$ redis-cli -h auth-cluster.1234id.clustercfg.euw1.cache.amazonaws.com --tls --askpass
Please input password: *************
Warning: AUTH failed

The following is an example of a Redis command with the correct password that worked as expected:

$ redis-cli -h auth-cluster.1234id.clustercfg.euw1.cache.amazonaws.com --tls --askpass
Please input password: ******************
auth-cluster.1234id.clustercfg.euw1.cache.amazonaws.com:6379> INFO
# Server
redis_version:6.2.6
----- omitted -----

openssl

You can test connectivity using the openssl command. Use this command for debugging purposes only:

$ openssl s_client -connect master.auth-cluster.3i1yig.euw1.cache.amazonaws.com:6379
CONNECTED(00000003)
----- omitted -----
---
AUTH topsecretpassword
+OK
INFO
# Server
redis_version:6.2.6
----- omitted -----

For more information, see the Connecting to an Encryption/Authentication enabled cluster section in Step 4: Connect to the cluster's node.

For additional details on troubleshooting ElastiCache connectivity, see Troubleshooting.


AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago