Why does the rollover index action in my ISM policy keep failing in OpenSearch Service?

7 minute read
1

I want to use Index State Management (ISM) to roll over my indices on my Amazon OpenSearch Service cluster. However, my index fails to roll over, and I receive an error.

Short description

The following reasons might cause the "Failed to rollover" index error:

  • The rollover target doesn't exist.
  • The rollover alias is missing.
  • The index name doesn't match the index pattern.
  • The rollover alias points to a duplicated alias in an index template.
  • You have maximum resource utilization on your cluster.

To resolve this issue, use the explain API to identify the cause of your error. Then, check your ISM policy. For more information about setting up the rollover action, see How do I use ISM to manage low storage space in OpenSearch Service?

Note: The following resolution applies only to the OpenSearch API. For the legacy Open Distro API, refer to ISM API on the Open Distro website.

Resolution

Use the explain API to identify the cause

To identify the root cause of your "Failed to rollover index error", use the explain API:

GET _plugins/_ism/explain/logs-000001?pretty

Example output of the explain API:

{
     "logs-000001": {
          "index.plugins.index_state_management.policy_id": "rollover-workflow",
          "index": "logs-000001",
          "index_uuid": "JUWl2CSES2mWYXqpJJ8qlA",
          "policy_id": "rollover-workflow",
          "policy_seq_no": 2,
          "policy_primary_term": 1,
          "rolled_over": false,
          "state": {
               "name": "open",
               "start_time": 1614738037066
          },
          "action": {
               "name": "rollover",
               "start_time": 1614739372091,
               "index": 0,
               "failed": true,
               "consumed_retries": 0,
               "last_retry_time": 0
          },
          "retry_info": {
               "failed": false,
               "consumed_retries": 0
          },
          "info": {
               "cause": "rollover target [rolling-indices] does not exist",
               "message": "Failed to rollover index [index=logs-000001]"
          }
     }
}

This example output shows that the indices failed to roll over because the target rollover alias rolling-indices doesn't exist.

The rollover target doesn't exist

If the cause is "rollover target [rolling-indices] does not exist", then check whether the index was bootstrapped with the rollover alias:

GET _cat/aliases

The output lists all the current aliases in the cluster and their associated indices. If ISM indicates that your rollover target doesn't exist, then a rollover alias name and failed index association are missing.

To resolve the failed index association, attach the rollover alias to the index:

POST /_aliases
{
     "actions": [{
          "add": {
               "index": "logs-000001",
               "alias": "my-data"
          }
     }]
}

After you attach the rollover alias, retry the rollover action on the managed index in OpenSearch Service:

POST _plugins/_ism/retry/logs-000001

For more information, see Retry failed index on the OpenSearch website.

When you retry the failed index, you might receive an "Attempting to retry" status message. Wait for the next ISM cycle to run. ISM cycles run every 30 to 48 minutes. If the rollover action is successful, then you receive the following message: "Successfully rolled over index".

The rollover alias is missing

If the cause of your rollover failure is a missing rollover alias, then check the settings of the failed index:

GET <failed-index-name>/_settings

If you see that the index.plugins.index_state_management.rollover_alias setting is missing, then manually add the setting to your index:

PUT /<failed-index-name>/_settings
{
     "index.plugins.index_state_management.rollover_alias" : "<rollover-alias>"
}

Use the retry failed index API to retry the rollover operation on the failed index. When the rollover action is being retried, update your policy template:

PUT _index_template/<template-name>

Make sure to use the same settings from your existing policy template so that your rollover alias is applied to the newly created indices.

Example:

PUT _index_template/<existing-template> 
{
     "index_patterns": [
          "<index-pattern*>"
     ],
     "template": {
          "settings": {
               "plugins.index_state_management.rollover_alias": "<rollover-alias>"
          }
     }
}

The index name doesn't match the index pattern

If your ISM policy indicates that your rollover operation failed because your index name and index pattern don't match, then check the failed index's name. For successful rollovers, the index names must match the following regex pattern:

`^.*-\d+$`

This regex pattern conveys that index names must include text followed by a hyphen (-) and one or more digits. If the index name doesn't follow this pattern, and your first index has data written on it, then re-index the data. When you re-index the data, use the correct name for your new index.

Example:

POST _reindex
{
     "source": {
          "index": "<failed-index>"
     },
     "dest": {
          "index": "my-new-index-000001"
     }
}

When the data API is re-indexing, detach the rollover alias from the failed index. Then, add the rollover alias to the new index so that the data source can continue to write the incoming data to a new index. For more information, see the reindex document API on the OpenSearch website.

Example:

POST /_aliases
{
     "actions": [{
          "remove": {
               "index": "<failed-index>",
               "alias": "<rollover-alias>"
          }
     },
     {
          "add": {
               "index": "my-new-index-000001",
               "alias": "<rollover-alias>"
          }
     }]
}

Use the following API call to manually attach the ISM policy to the new index:

POST _plugins/_ism/add/my-new-index-*
{
     "policy_id": "<policy_id>"
}

Update the existing template to reflect the new index pattern name. For example:

PUT _index_template/<existing-template> 
{
     "index_patterns": ["<my-new-index-pattern*>"],
}

Note: Your ISM policy and rollover alias must reflect the successive indices that are created with the same index pattern.

The rollover alias is pointing to a duplicated alias in an index template

If your index rollover failed because a rollover alias points to a duplicated alias, then check your index template settings:

GET _index_template/<template-name>

Check whether your template contains an additional aliases section with another alias that points to the same index:

{
     "index_patterns": ["my-index*"],
     "settings": {
          "index.plugins.index_state_management.rollover_alias": "<rollover-alias>"
     },
     "aliases": {
          "another_alias": {
               "is_write_index": true
          }
     }
}

Multiple aliases cause the rollover to fail. To resolve this failure, update the template settings without specifying any aliases:

PUT _index_template/<template-name>

Then, perform the retry API on the failed index:

POST _plugins/_ism/retry/logs-000001

Important: If an alias points to multiple indices, then make sure that only one index has write access activated. The rollover API automatically provides write access for the index that the rollover alias points to. When you perform the rollover operation in ISM, you don't need to specify any aliases for the is_write_index setting.

You have maximum resource utilization on your cluster

A circuit breaker exception or lack of storage space can cause the maximum resource utilization on your cluster.

Circuit breaker exception

If the cause is a circuit breaker exception, then your cluster likely experienced high JVM memory pressure when the rollover API was called. To troubleshoot JVM memory pressure issues, see How do I troubleshoot high JVM memory pressure on my OpenSearch Service cluster?
After the JVM memory pressure falls below 75%, you can retry the activity on the failed index with the following API call:

POST _plugins/_ism/retry/<failed-index-name>

Note: Use index patterns (*) to retry the activities on multiple failed indices.

If you experience infrequent JVM spikes on your cluster, then you can also update the ISM policy with the following retry block for the rollover action:

{
     "actions": {
          "retry": {
               "count": 3,
               "backoff": "exponential",
               "delay": "10m"
          }
     }
}

In your ISM policy, each action has an automated retry based on the count parameter. If your previous operation fails, then check the delay parameter to see how long you must wait for ISM to retry the action. For more information, see Actions on the OpenSearch website.

Lack of storage space

If your cluster is running out of storage space, then OpenSearch Service initiates a write block on the cluster. The write block causes all write operations to return a ClusterBlockException on your cluster. Your ClusterIndexWritesBlocked metric shows a value of "1", indicating that the cluster is blocking requests. Therefore, any attempts to create a new index fail. The explain API call also returns a 403 IndexCreateBlockException, indicating that the cluster is out of storage space. To troubleshoot the cluster block exception, see How do I resolve the 403 "index_create_block_exception" error in OpenSearch Service?

After the ClusterIndexWritesBlocked metric returns to "0", retry the ISM action on the failed index. If your JVM memory pressure exceeds 92% for more than 30 minutes, then a write block might be initiated. If you encounter a write block, then troubleshoot the JVM memory pressure instead. For more information about how to troubleshoot JVM memory pressure, see How do I troubleshoot high JVM memory pressure on my OpenSearch Service cluster?

AWS OFFICIAL
AWS OFFICIALUpdated 10 months ago