mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a bad idea since it will lead to the kinds of overshoot that were otherwise fixed in #46079. This setting was deprecated in #47443. This commit removes it.
103 lines
4.6 KiB
Text
103 lines
4.6 KiB
Text
[[disk-allocator]]
|
|
=== Disk-based shard allocation
|
|
|
|
Elasticsearch considers the available disk space on a node before deciding
|
|
whether to allocate new shards to that node or to actively relocate shards away
|
|
from that node.
|
|
|
|
Below are the settings that can be configured in the `elasticsearch.yml` config
|
|
file or updated dynamically on a live cluster with the
|
|
<<cluster-update-settings,cluster-update-settings>> API:
|
|
|
|
`cluster.routing.allocation.disk.threshold_enabled`::
|
|
|
|
Defaults to `true`. Set to `false` to disable the disk allocation decider.
|
|
|
|
`cluster.routing.allocation.disk.watermark.low`::
|
|
|
|
Controls the low watermark for disk usage. It defaults to `85%`, meaning
|
|
that Elasticsearch will not allocate shards to nodes that have more than
|
|
85% disk used. It can also be set to an absolute byte value (like `500mb`)
|
|
to prevent Elasticsearch from allocating shards if less than the specified
|
|
amount of space is available. This setting has no effect on the primary
|
|
shards of newly-created indices or, specifically, any shards that have
|
|
never previously been allocated.
|
|
|
|
`cluster.routing.allocation.disk.watermark.high`::
|
|
|
|
Controls the high watermark. It defaults to `90%`, meaning that
|
|
Elasticsearch will attempt to relocate shards away from a node whose disk
|
|
usage is above 90%. It can also be set to an absolute byte value (similarly
|
|
to the low watermark) to relocate shards away from a node if it has less
|
|
than the specified amount of free space. This setting affects the
|
|
allocation of all shards, whether previously allocated or not.
|
|
|
|
`cluster.routing.allocation.disk.watermark.flood_stage`::
|
|
+
|
|
--
|
|
Controls the flood stage watermark. It defaults to 95%, meaning that
|
|
Elasticsearch enforces a read-only index block
|
|
(`index.blocks.read_only_allow_delete`) on every index that has one or more
|
|
shards allocated on the node that has at least one disk exceeding the flood
|
|
stage. This is a last resort to prevent nodes from running out of disk space.
|
|
The index block is automatically released once the disk utilization falls below
|
|
the high watermark.
|
|
|
|
NOTE: You can not mix the usage of percentage values and byte values within
|
|
these settings. Either all are set to percentage values, or all are set to byte
|
|
values. This is so that we can we validate that the settings are internally
|
|
consistent (that is, the low disk threshold is not more than the high disk
|
|
threshold, and the high disk threshold is not more than the flood stage
|
|
threshold).
|
|
|
|
An example of resetting the read-only index block on the `twitter` index:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /twitter/_settings
|
|
{
|
|
"index.blocks.read_only_allow_delete": null
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:twitter]
|
|
--
|
|
|
|
`cluster.info.update.interval`::
|
|
|
|
How often Elasticsearch should check on disk usage for each node in the
|
|
cluster. Defaults to `30s`.
|
|
|
|
NOTE: Percentage values refer to used disk space, while byte values refer to
|
|
free disk space. This can be confusing, since it flips the meaning of high and
|
|
low. For example, it makes sense to set the low watermark to 10gb and the high
|
|
watermark to 5gb, but not the other way around.
|
|
|
|
An example of updating the low watermark to at least 100 gigabytes free, a high
|
|
watermark of at least 50 gigabytes free, and a flood stage watermark of 10
|
|
gigabytes free, and updating the information about the cluster every minute:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"transient": {
|
|
"cluster.routing.allocation.disk.watermark.low": "100gb",
|
|
"cluster.routing.allocation.disk.watermark.high": "50gb",
|
|
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
|
|
"cluster.info.update.interval": "1m"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
{es} accounts for the future disk usage of ongoing shard relocations and
|
|
recoveries to help prevent these shard movements from breaching a watermark.
|
|
This mechanism may double-count some data that has already been relocated onto
|
|
a node. For instance, if a relocation of a 100GB shard is 90% complete then
|
|
{es} has copied 90GB of data onto the target node. This 90GB consumes disk
|
|
space and will be reflected in the node's disk usage statistics. However {es}
|
|
also treats the relocation as if it will consume another full 100GB in the
|
|
future, even though the shard may really only consume a further 10GB of space.
|
|
If the node's disks are close to a watermark then this may temporarily prevent
|
|
other shards from moving onto the same node. Eventually the relocation will
|
|
complete and then {es} will use the node's true disk usage statistics again.
|
|
|