elasticsearch/docs/reference/ilm/error-handling.asciidoc
James Rodewig f56a0f4b66
[DOCS] Remove testenv annotations from doc snippet tests (#80023)
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.

Relates to #79309, #31619
2021-11-05 18:38:50 -04:00

214 lines
No EOL
7.1 KiB
Text

[role="xpack"]
[[index-lifecycle-error-handling]]
== Troubleshooting {ilm} errors
When {ilm-init} executes a lifecycle policy, it's possible for errors to occur
while performing the necessary index operations for a step.
When this happens, {ilm-init} moves the index to an `ERROR` step.
If {ilm-init} cannot resolve the error automatically, execution is halted
until you resolve the underlying issues with the policy, index, or cluster.
For example, you might have a `shrink-index` policy that shrinks an index to four shards once it
is at least five days old:
[source,console]
--------------------------------------------------
PUT _ilm/policy/shrink-index
{
"policy": {
"phases": {
"warm": {
"min_age": "5d",
"actions": {
"shrink": {
"number_of_shards": 4
}
}
}
}
}
}
--------------------------------------------------
// TEST
There is nothing that prevents you from applying the `shrink-index` policy to a new
index that has only two shards:
[source,console]
--------------------------------------------------
PUT /my-index-000001
{
"settings": {
"index.number_of_shards": 2,
"index.lifecycle.name": "shrink-index"
}
}
--------------------------------------------------
// TEST[continued]
After five days, {ilm-init} attempts to shrink `my-index-000001` from two shards to four shards.
Because the shrink action cannot _increase_ the number of shards, this operation fails
and {ilm-init} moves `my-index-000001` to the `ERROR` step.
You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to get information about
what went wrong:
[source,console]
--------------------------------------------------
GET /my-index-000001/_ilm/explain
--------------------------------------------------
// TEST[continued]
Which returns the following information:
[source,console-result]
--------------------------------------------------
{
"indices" : {
"my-index-000001" : {
"index" : "my-index-000001",
"managed" : true,
"policy" : "shrink-index", <1>
"lifecycle_date_millis" : 1541717265865,
"age": "5.1d", <2>
"phase" : "warm", <3>
"phase_time_millis" : 1541717272601,
"action" : "shrink", <4>
"action_time_millis" : 1541717272601,
"step" : "ERROR", <5>
"step_time_millis" : 1541717272688,
"failed_step" : "shrink", <6>
"step_info" : {
"type" : "illegal_argument_exception", <7>
"reason" : "the number of target shards [4] must be less that the number of source shards [2]"
},
"phase_execution" : {
"policy" : "shrink-index",
"phase_definition" : { <8>
"min_age" : "5d",
"actions" : {
"shrink" : {
"number_of_shards" : 4
}
}
},
"version" : 1,
"modified_date_in_millis" : 1541717264230
}
}
}
}
--------------------------------------------------
// TESTRESPONSE[skip:no way to know if we will get this response immediately]
<1> The policy being used to manage the index: `shrink-index`
<2> The index age: 5.1 days
<3> The phase the index is currently in: `warm`
<4> The current action: `shrink`
<5> The step the index is currently in: `ERROR`
<6> The step that failed to execute: `shrink`
<7> The type of error and a description of that error.
<8> The definition of the current phase from the `shrink-index` policy
To resolve this, you could update the policy to shrink the index to a single shard after 5 days:
[source,console]
--------------------------------------------------
PUT _ilm/policy/shrink-index
{
"policy": {
"phases": {
"warm": {
"min_age": "5d",
"actions": {
"shrink": {
"number_of_shards": 1
}
}
}
}
}
}
--------------------------------------------------
// TEST[continued]
[discrete]
=== Retrying failed lifecycle policy steps
Once you fix the problem that put an index in the `ERROR` step,
you might need to explicitly tell {ilm-init} to retry the step:
[source,console]
--------------------------------------------------
POST /my-index-000001/_ilm/retry
--------------------------------------------------
// TEST[skip:we can't be sure the index is ready to be retried at this point]
{ilm-init} subsequently attempts to re-run the step that failed.
You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to monitor the progress.
[discrete]
=== Common {ilm-init} errors
Here's how to resolve the most common errors reported in the `ERROR` step.
TIP: Problems with rollover aliases are a common cause of errors.
Consider using <<data-streams, data streams>> instead of managing rollover with aliases.
[discrete]
==== Rollover alias [x] can point to multiple indices, found duplicated alias [x] in index template [z]
The target rollover alias is specified in an index template's `index.lifecycle.rollover_alias` setting.
You need to explicitly configure this alias _one time_ when you
<<ilm-gs-alias-bootstrap, bootstrap the initial index>>.
The rollover action then manages setting and updating the alias to
<<rollover-index-api-desc, roll over>> to each subsequent index.
Do not explicitly configure this same alias in the aliases section of an index template.
[discrete]
==== index.lifecycle.rollover_alias [x] does not point to index [y]
Either the index is using the wrong alias or the alias does not exist.
Check the `index.lifecycle.rollover_alias` <<indices-get-settings, index setting>>.
To see what aliases are configured, use <<cat-alias, _cat/aliases>>.
[discrete]
==== Setting [index.lifecycle.rollover_alias] for index [y] is empty or not defined
The `index.lifecycle.rollover_alias` setting must be configured for the rollover action to work.
Update the index settings to set `index.lifecycle.rollover_alias`.
[discrete]
==== Alias [x] has more than one write index [y,z]
Only one index can be designated as the write index for a particular alias.
Use the <<indices-aliases, aliases>> API to set `is_write_index:false` for all but one index.
[discrete]
==== index name [x] does not match pattern ^.*-\d+
The index name must match the regex pattern `^.*-\d+` for the rollover action to work.
The most common problem is that the index name does not contain trailing digits.
For example, `my-index` does not match the pattern requirement.
Append a numeric value to the index name, for example `my-index-000001`.
[discrete]
==== CircuitBreakingException: [x] data too large, data for [y]
This indicates that the cluster is hitting resource limits.
Before continuing to set up {ilm-init}, you'll need to take steps to alleviate the resource issues.
For more information, see <<circuit-breaker-errors>>.
[discrete]
==== High disk watermark [x] exceeded on [y]
This indicates that the cluster is running out of disk space.
This can happen when you don't have {ilm} set up to roll over from hot to warm nodes.
Consider adding nodes, upgrading your hardware, or deleting unneeded indices.