elasticsearch/docs/reference/cluster/allocation-explain.asciidoc
David Turner 95edc6deb2
Clarify allocation explain if random shard chosen (#75670)
Today we often encounter users that are confused by the behaviour of
calling `GET _cluster/allocation/explain` without a body: it _seems_ to
work, but it explains a random shard, and if this isn't the shard
they're thinking of then it's unclear how to proceed.

With this commit we add a note to the response when a shard was randomly
chosen indicating that it is possible, and possibly useful, to explain a
different shard. We also adjust the exception message in the case when
all shards are assigned to indicate why it's an invalid request and what
to do to make it valid.
2021-08-02 15:14:09 +01:00

337 lines
10 KiB
Text

[[cluster-allocation-explain]]
=== Cluster allocation explain API
++++
<titleabbrev>Cluster allocation explain</titleabbrev>
++++
Provides an explanation for a shard's current allocation.
[source,console]
----
GET _cluster/allocation/explain
{
"index": "my-index-000001",
"shard": 0,
"primary": false,
"current_node": "my-node"
}
----
// TEST[setup:my_index]
// TEST[s/"primary": false,/"primary": false/]
// TEST[s/"current_node": "my-node"//]
[[cluster-allocation-explain-api-request]]
==== {api-request-title}
`GET _cluster/allocation/explain`
`POST _cluster/allocation/explain`
[[cluster-allocation-explain-api-prereqs]]
==== {api-prereq-title}
* If the {es} {security-features} are enabled, you must have the `monitor` or
`manage` <<privileges-list-cluster,cluster privilege>> to use this API.
[[cluster-allocation-explain-api-desc]]
==== {api-description-title}
The purpose of the cluster allocation explain API is to provide
explanations for shard allocations in the cluster. For unassigned shards,
the explain API provides an explanation for why the shard is unassigned.
For assigned shards, the explain API provides an explanation for why the
shard is remaining on its current node and has not moved or rebalanced to
another node. This API can be very useful when attempting to diagnose why a
shard is unassigned or why a shard continues to remain on its current node when
you might expect otherwise.
[[cluster-allocation-explain-api-query-params]]
==== {api-query-parms-title}
`include_disk_info`::
(Optional, Boolean) If `true`, returns information about disk usage and
shard sizes. Defaults to `false`.
`include_yes_decisions`::
(Optional, Boolean) If `true`, returns 'YES' decisions in explanation.
Defaults to `false`.
[[cluster-allocation-explain-api-request-body]]
==== {api-request-body-title}
`current_node`::
(Optional, string) Specifies the node ID or the name of the node to only
explain a shard that is currently located on the specified node.
`index`::
(Optional, string) Specifies the name of the index that you would like an
explanation for.
`primary`::
(Optional, Boolean) If `true`, returns explanation for the primary shard
for the given shard ID.
`shard`::
(Optional, integer) Specifies the ID of the shard that you would like an
explanation for.
[[cluster-allocation-explain-api-examples]]
==== {api-examples-title}
===== Unassigned primary shard
The following request gets an allocation explanation for an unassigned primary
shard.
////
[source,console]
----
PUT my-index-000001?master_timeout=1s&timeout=1s
{
"settings": {
"index.routing.allocation.include._name": "nonexistent_node",
"index.routing.allocation.include._tier_preference": null
}
}
----
////
[source,console]
----
GET _cluster/allocation/explain
{
"index": "my-index-000001",
"shard": 0,
"primary": true
}
----
// TEST[continued]
The API response indicates the shard is allocated to a nonexistent node.
[source,console-result]
----
{
"index" : "my-index-000001",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned", <1>
"unassigned_info" : {
"reason" : "INDEX_CREATED", <2>
"at" : "2017-01-04T18:08:16.600Z",
"last_allocation_status" : "no"
},
"can_allocate" : "no", <3>
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "8qt2rY-pT6KNZB3-hGfLnw",
"node_name" : "node-0",
"transport_address" : "127.0.0.1:9401",
"node_attributes" : {},
"node_decision" : "no", <4>
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "filter", <5>
"decision" : "NO",
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]" <6>
}
]
}
]
}
----
// TESTRESPONSE[s/"at" : "[^"]*"/"at" : $body.$_path/]
// TESTRESPONSE[s/"node_id" : "[^"]*"/"node_id" : $body.$_path/]
// TESTRESPONSE[s/"transport_address" : "[^"]*"/"transport_address" : $body.$_path/]
// TESTRESPONSE[s/"node_attributes" : \{\}/"node_attributes" : $body.$_path/]
<1> The current state of the shard.
<2> The reason for the shard originally becoming unassigned.
<3> Whether to allocate the shard.
<4> Whether to allocate the shard to the particular node.
<5> The decider which led to the `no` decision for the node.
<6> An explanation as to why the decider returned a `no` decision, with a helpful hint pointing to the setting that led to the decision.
The following response contains an allocation explanation for an unassigned
primary shard that was previously allocated.
[source,js]
----
{
"index" : "my-index-000001",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2017-01-04T18:03:28.464Z",
"details" : "node_left[OIWe8UhhThCK0V5XfmdrmQ]",
"last_allocation_status" : "no_valid_shard_copy"
},
"can_allocate" : "no_valid_shard_copy",
"allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster"
}
----
// NOTCONSOLE
===== Unassigned replica shard
The following response contains an allocation explanation for a replica that's
unassigned due to <<delayed-allocation,delayed allocation>>.
[source,js]
----
{
"index" : "my-index-000001",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2017-01-04T18:53:59.498Z",
"details" : "node_left[G92ZwuuaRY-9n8_tc-IzEg]",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "allocation_delayed",
"allocate_explanation" : "cannot allocate because the cluster is still waiting 59.8s for the departed node holding a replica to rejoin, despite being allowed to allocate the shard to at least one other node",
"configured_delay" : "1m", <1>
"configured_delay_in_millis" : 60000,
"remaining_delay" : "59.8s", <2>
"remaining_delay_in_millis" : 59824,
"node_allocation_decisions" : [
{
"node_id" : "pmnHu_ooQWCPEFobZGbpWw",
"node_name" : "node_t2",
"transport_address" : "127.0.0.1:9402",
"node_decision" : "yes"
},
{
"node_id" : "3sULLVJrRneSg0EfBB-2Ew",
"node_name" : "node_t0",
"transport_address" : "127.0.0.1:9400",
"node_decision" : "no",
"store" : { <3>
"matching_size" : "4.2kb",
"matching_size_in_bytes" : 4325
},
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "a copy of this shard is already allocated to this node [[my-index-000001][0], node[3sULLVJrRneSg0EfBB-2Ew], [P], s[STARTED], a[id=eV9P8BN1QPqRc3B4PLx6cg]]"
}
]
}
]
}
----
// NOTCONSOLE
<1> The configured delay before allocating a replica shard that does not exist due to the node holding it leaving the cluster.
<2> The remaining delay before allocating the replica shard.
<3> Information about the shard data found on a node.
===== Assigned shard
The following response contains an allocation explanation for an assigned shard.
The response indicates the shard is not allowed to remain on its current node
and must be reallocated.
[source,js]
----
{
"index" : "my-index-000001",
"shard" : 0,
"primary" : true,
"current_state" : "started",
"current_node" : {
"id" : "8lWJeJ7tSoui0bxrwuNhTA",
"name" : "node_t1",
"transport_address" : "127.0.0.1:9401"
},
"can_remain_on_current_node" : "no", <1>
"can_remain_decisions" : [ <2>
{
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"
}
],
"can_move_to_other_node" : "no", <3>
"move_explanation" : "cannot move shard to another node, even though it is not allowed to remain on its current node",
"node_allocation_decisions" : [
{
"node_id" : "_P8olZS8Twax9u6ioN-GGA",
"node_name" : "node_t0",
"transport_address" : "127.0.0.1:9400",
"node_decision" : "no",
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"
}
]
}
]
}
----
// NOTCONSOLE
<1> Whether the shard is allowed to remain on its current node.
<2> The deciders that factored into the decision of why the shard is not allowed to remain on its current node.
<3> Whether the shard is allowed to be allocated to another node.
The following response contains an allocation explanation for a shard that must
remain on its current node. Moving the shard to another node would not improve
cluster balance.
[source,js]
----
{
"index" : "my-index-000001",
"shard" : 0,
"primary" : true,
"current_state" : "started",
"current_node" : {
"id" : "wLzJm4N4RymDkBYxwWoJsg",
"name" : "node_t0",
"transport_address" : "127.0.0.1:9400",
"weight_ranking" : 1
},
"can_remain_on_current_node" : "yes",
"can_rebalance_cluster" : "yes", <1>
"can_rebalance_to_other_node" : "no", <2>
"rebalance_explanation" : "cannot rebalance as no target node exists that can both allocate this shard and improve the cluster balance",
"node_allocation_decisions" : [
{
"node_id" : "oE3EGFc8QN-Tdi5FFEprIA",
"node_name" : "node_t1",
"transport_address" : "127.0.0.1:9401",
"node_decision" : "worse_balance", <3>
"weight_ranking" : 1
}
]
}
----
// NOTCONSOLE
<1> Whether rebalancing is allowed on the cluster.
<2> Whether the shard can be rebalanced to another node.
<3> The reason the shard cannot be rebalanced to the node, in this case indicating that it offers no better balance than the current node.
===== No arguments
If you call the API with no arguments, {es} retrieves an allocation explanation
for an arbitrary unassigned primary or replica shard.
[source,console]
----
GET _cluster/allocation/explain
----
// TEST[catch:bad_request]
If the cluster contains no unassigned shards, the API returns a `400` error.