mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
The cluster allocation explain API includes a top-level status indicating to the user whether the shard can be assigned/rebalanced/etc or not. Today this status is fairly terse and experience shows that users sometimes struggle to understand how to interpret it and to decide on follow-up actions. This commit makes the top-level explanation more detailed and actionable. For instance, in the cases like `THROTTLED` where the status is transient we instruct the user to wait; if a shard is lost we say to restore it from a snapshot; if a shard cannot be assigned we say to choose a specific node where its assignment is expected and to address the obstacles. Co-authored-by: James Rodewig <james.rodewig@elastic.co>
338 lines
11 KiB
Text
338 lines
11 KiB
Text
[[cluster-allocation-explain]]
|
|
=== Cluster allocation explain API
|
|
++++
|
|
<titleabbrev>Cluster allocation explain</titleabbrev>
|
|
++++
|
|
|
|
Provides an explanation for a shard's current allocation.
|
|
|
|
[source,console]
|
|
----
|
|
GET _cluster/allocation/explain
|
|
{
|
|
"index": "my-index-000001",
|
|
"shard": 0,
|
|
"primary": false,
|
|
"current_node": "my-node"
|
|
}
|
|
----
|
|
// TEST[setup:my_index]
|
|
// TEST[s/"primary": false,/"primary": false/]
|
|
// TEST[s/"current_node": "my-node"//]
|
|
|
|
[[cluster-allocation-explain-api-request]]
|
|
==== {api-request-title}
|
|
|
|
`GET _cluster/allocation/explain`
|
|
|
|
`POST _cluster/allocation/explain`
|
|
|
|
[[cluster-allocation-explain-api-prereqs]]
|
|
==== {api-prereq-title}
|
|
|
|
* If the {es} {security-features} are enabled, you must have the `monitor` or
|
|
`manage` <<privileges-list-cluster,cluster privilege>> to use this API.
|
|
|
|
[[cluster-allocation-explain-api-desc]]
|
|
==== {api-description-title}
|
|
|
|
The purpose of the cluster allocation explain API is to provide
|
|
explanations for shard allocations in the cluster. For unassigned shards,
|
|
the explain API provides an explanation for why the shard is unassigned.
|
|
For assigned shards, the explain API provides an explanation for why the
|
|
shard is remaining on its current node and has not moved or rebalanced to
|
|
another node. This API can be very useful when attempting to diagnose why a
|
|
shard is unassigned or why a shard continues to remain on its current node when
|
|
you might expect otherwise.
|
|
|
|
[[cluster-allocation-explain-api-query-params]]
|
|
==== {api-query-parms-title}
|
|
|
|
`include_disk_info`::
|
|
(Optional, Boolean) If `true`, returns information about disk usage and
|
|
shard sizes. Defaults to `false`.
|
|
|
|
`include_yes_decisions`::
|
|
(Optional, Boolean) If `true`, returns 'YES' decisions in explanation.
|
|
Defaults to `false`.
|
|
|
|
[[cluster-allocation-explain-api-request-body]]
|
|
==== {api-request-body-title}
|
|
|
|
`current_node`::
|
|
(Optional, string) Specifies the node ID or the name of the node to only
|
|
explain a shard that is currently located on the specified node.
|
|
|
|
`index`::
|
|
(Optional, string) Specifies the name of the index that you would like an
|
|
explanation for.
|
|
|
|
`primary`::
|
|
(Optional, Boolean) If `true`, returns explanation for the primary shard
|
|
for the given shard ID.
|
|
|
|
`shard`::
|
|
(Optional, integer) Specifies the ID of the shard that you would like an
|
|
explanation for.
|
|
|
|
[[cluster-allocation-explain-api-examples]]
|
|
==== {api-examples-title}
|
|
|
|
===== Unassigned primary shard
|
|
|
|
The following request gets an allocation explanation for an unassigned primary
|
|
shard.
|
|
|
|
////
|
|
[source,console]
|
|
----
|
|
PUT my-index-000001?master_timeout=1s&timeout=1s
|
|
{
|
|
"settings": {
|
|
"index.routing.allocation.include._name": "nonexistent_node",
|
|
"index.routing.allocation.include._tier_preference": null
|
|
}
|
|
}
|
|
----
|
|
////
|
|
|
|
[source,console]
|
|
----
|
|
GET _cluster/allocation/explain
|
|
{
|
|
"index": "my-index-000001",
|
|
"shard": 0,
|
|
"primary": true
|
|
}
|
|
----
|
|
// TEST[continued]
|
|
|
|
The API response indicates the shard can only be allocated to a nonexistent
|
|
node.
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"index" : "my-index-000001",
|
|
"shard" : 0,
|
|
"primary" : true,
|
|
"current_state" : "unassigned", <1>
|
|
"unassigned_info" : {
|
|
"reason" : "INDEX_CREATED", <2>
|
|
"at" : "2017-01-04T18:08:16.600Z",
|
|
"last_allocation_status" : "no"
|
|
},
|
|
"can_allocate" : "no", <3>
|
|
"allocate_explanation" : "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
|
|
"node_allocation_decisions" : [
|
|
{
|
|
"node_id" : "8qt2rY-pT6KNZB3-hGfLnw",
|
|
"node_name" : "node-0",
|
|
"transport_address" : "127.0.0.1:9401",
|
|
"node_attributes" : {},
|
|
"node_decision" : "no", <4>
|
|
"weight_ranking" : 1,
|
|
"deciders" : [
|
|
{
|
|
"decider" : "filter", <5>
|
|
"decision" : "NO",
|
|
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]" <6>
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// TESTRESPONSE[s/"at" : "[^"]*"/"at" : $body.$_path/]
|
|
// TESTRESPONSE[s/"node_id" : "[^"]*"/"node_id" : $body.$_path/]
|
|
// TESTRESPONSE[s/"transport_address" : "[^"]*"/"transport_address" : $body.$_path/]
|
|
// TESTRESPONSE[s/"node_attributes" : \{\}/"node_attributes" : $body.$_path/]
|
|
|
|
<1> The current state of the shard.
|
|
<2> The reason for the shard originally becoming unassigned.
|
|
<3> Whether to allocate the shard.
|
|
<4> Whether to allocate the shard to the particular node.
|
|
<5> The decider which led to the `no` decision for the node.
|
|
<6> An explanation as to why the decider returned a `no` decision, with a helpful hint pointing to the setting that led to the decision.
|
|
|
|
The following response contains an allocation explanation for an unassigned
|
|
primary shard that was previously allocated.
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"index" : "my-index-000001",
|
|
"shard" : 0,
|
|
"primary" : true,
|
|
"current_state" : "unassigned",
|
|
"unassigned_info" : {
|
|
"reason" : "NODE_LEFT",
|
|
"at" : "2017-01-04T18:03:28.464Z",
|
|
"details" : "node_left[OIWe8UhhThCK0V5XfmdrmQ]",
|
|
"last_allocation_status" : "no_valid_shard_copy"
|
|
},
|
|
"can_allocate" : "no_valid_shard_copy",
|
|
"allocate_explanation" : "Elasticsearch can't allocate this shard because there are no copies of its data in the cluster. Elasticsearch will allocate this shard when a node holding a good copy of its data joins the cluster. If no such node is available, restore this index from a recent snapshot."
|
|
}
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
===== Unassigned replica shard
|
|
|
|
The following response contains an allocation explanation for a replica that's
|
|
unassigned due to <<delayed-allocation,delayed allocation>>.
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"index" : "my-index-000001",
|
|
"shard" : 0,
|
|
"primary" : false,
|
|
"current_state" : "unassigned",
|
|
"unassigned_info" : {
|
|
"reason" : "NODE_LEFT",
|
|
"at" : "2017-01-04T18:53:59.498Z",
|
|
"details" : "node_left[G92ZwuuaRY-9n8_tc-IzEg]",
|
|
"last_allocation_status" : "no_attempt"
|
|
},
|
|
"can_allocate" : "allocation_delayed",
|
|
"allocate_explanation" : "The node containing this shard copy recently left the cluster. Elasticsearch is waiting for it to return. If the node does not return within [%s] then Elasticsearch will allocate this shard to another node. Please wait.",
|
|
"configured_delay" : "1m", <1>
|
|
"configured_delay_in_millis" : 60000,
|
|
"remaining_delay" : "59.8s", <2>
|
|
"remaining_delay_in_millis" : 59824,
|
|
"node_allocation_decisions" : [
|
|
{
|
|
"node_id" : "pmnHu_ooQWCPEFobZGbpWw",
|
|
"node_name" : "node_t2",
|
|
"transport_address" : "127.0.0.1:9402",
|
|
"node_decision" : "yes"
|
|
},
|
|
{
|
|
"node_id" : "3sULLVJrRneSg0EfBB-2Ew",
|
|
"node_name" : "node_t0",
|
|
"transport_address" : "127.0.0.1:9400",
|
|
"node_decision" : "no",
|
|
"store" : { <3>
|
|
"matching_size" : "4.2kb",
|
|
"matching_size_in_bytes" : 4325
|
|
},
|
|
"deciders" : [
|
|
{
|
|
"decider" : "same_shard",
|
|
"decision" : "NO",
|
|
"explanation" : "a copy of this shard is already allocated to this node [[my-index-000001][0], node[3sULLVJrRneSg0EfBB-2Ew], [P], s[STARTED], a[id=eV9P8BN1QPqRc3B4PLx6cg]]"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
<1> The configured delay before allocating a replica shard that does not exist due to the node holding it leaving the cluster.
|
|
<2> The remaining delay before allocating the replica shard.
|
|
<3> Information about the shard data found on a node.
|
|
|
|
===== Assigned shard
|
|
|
|
The following response contains an allocation explanation for an assigned shard.
|
|
The response indicates the shard is not allowed to remain on its current node
|
|
and must be reallocated.
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"index" : "my-index-000001",
|
|
"shard" : 0,
|
|
"primary" : true,
|
|
"current_state" : "started",
|
|
"current_node" : {
|
|
"id" : "8lWJeJ7tSoui0bxrwuNhTA",
|
|
"name" : "node_t1",
|
|
"transport_address" : "127.0.0.1:9401"
|
|
},
|
|
"can_remain_on_current_node" : "no", <1>
|
|
"can_remain_decisions" : [ <2>
|
|
{
|
|
"decider" : "filter",
|
|
"decision" : "NO",
|
|
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"
|
|
}
|
|
],
|
|
"can_move_to_other_node" : "no", <3>
|
|
"move_explanation" : "This shard may not remain on its current node, but Elasticsearch isn't allowed to move it to another node. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
|
|
"node_allocation_decisions" : [
|
|
{
|
|
"node_id" : "_P8olZS8Twax9u6ioN-GGA",
|
|
"node_name" : "node_t0",
|
|
"transport_address" : "127.0.0.1:9400",
|
|
"node_decision" : "no",
|
|
"weight_ranking" : 1,
|
|
"deciders" : [
|
|
{
|
|
"decider" : "filter",
|
|
"decision" : "NO",
|
|
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
<1> Whether the shard is allowed to remain on its current node.
|
|
<2> The deciders that factored into the decision of why the shard is not allowed to remain on its current node.
|
|
<3> Whether the shard is allowed to be allocated to another node.
|
|
|
|
The following response contains an allocation explanation for a shard that must
|
|
remain on its current node. Moving the shard to another node would not improve
|
|
cluster balance.
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"index" : "my-index-000001",
|
|
"shard" : 0,
|
|
"primary" : true,
|
|
"current_state" : "started",
|
|
"current_node" : {
|
|
"id" : "wLzJm4N4RymDkBYxwWoJsg",
|
|
"name" : "node_t0",
|
|
"transport_address" : "127.0.0.1:9400",
|
|
"weight_ranking" : 1
|
|
},
|
|
"can_remain_on_current_node" : "yes",
|
|
"can_rebalance_cluster" : "yes", <1>
|
|
"can_rebalance_to_other_node" : "no", <2>
|
|
"rebalance_explanation" : "Elasticsearch cannot rebalance this shard to another node since there is no node to which allocation is permitted which would improve the cluster balance. If you expect this shard to be rebalanced to another node, find this node in the node-by-node explanation and address the reasons which prevent Elasticsearch from rebalancing this shard there.",
|
|
"node_allocation_decisions" : [
|
|
{
|
|
"node_id" : "oE3EGFc8QN-Tdi5FFEprIA",
|
|
"node_name" : "node_t1",
|
|
"transport_address" : "127.0.0.1:9401",
|
|
"node_decision" : "worse_balance", <3>
|
|
"weight_ranking" : 1
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
<1> Whether rebalancing is allowed on the cluster.
|
|
<2> Whether the shard can be rebalanced to another node.
|
|
<3> The reason the shard cannot be rebalanced to the node, in this case indicating that it offers no better balance than the current node.
|
|
|
|
===== No arguments
|
|
|
|
If you call the API with no arguments, {es} retrieves an allocation explanation
|
|
for an arbitrary unassigned primary or replica shard.
|
|
|
|
[source,console]
|
|
----
|
|
GET _cluster/allocation/explain
|
|
----
|
|
// TEST[catch:bad_request]
|
|
|
|
If the cluster contains no unassigned shards, the API returns a `400` error.
|