elasticsearch/docs/reference/modules/cluster/disk_allocator.asciidoc
Adam Locke 7dd731b9a2
[DOCS] Explain flood stage watermark. (#57184)
* Changes for issue #36114.

* Adding stronger wording to the new note.

* Removing statement about typically not needting to set the read-only allow delete block.

* Replacing Elasticsearch with {es} variable.
2020-05-28 10:57:40 -04:00

107 lines
5 KiB
Text

[[disk-based-shard-allocation]]
==== Disk-based shard allocation settings
{es} considers the available disk space on a node before deciding
whether to allocate new shards to that node or to actively relocate shards away
from that node.
Below are the settings that can be configured in the `elasticsearch.yml` config
file or updated dynamically on a live cluster with the
<<cluster-update-settings,cluster-update-settings>> API:
`cluster.routing.allocation.disk.threshold_enabled`::
Defaults to `true`. Set to `false` to disable the disk allocation decider.
`cluster.routing.allocation.disk.watermark.low`::
Controls the low watermark for disk usage. It defaults to `85%`, meaning
that {es} will not allocate shards to nodes that have more than
85% disk used. It can also be set to an absolute byte value (like `500mb`)
to prevent {es} from allocating shards if less than the specified
amount of space is available. This setting has no effect on the primary
shards of newly-created indices but will prevent their replicas from being allocated.
`cluster.routing.allocation.disk.watermark.high`::
Controls the high watermark. It defaults to `90%`, meaning that
{es} will attempt to relocate shards away from a node whose disk
usage is above 90%. It can also be set to an absolute byte value (similarly
to the low watermark) to relocate shards away from a node if it has less
than the specified amount of free space. This setting affects the
allocation of all shards, whether previously allocated or not.
`cluster.routing.allocation.disk.watermark.enable_for_single_data_node`::
For a single data node, the default is to disregard disk watermarks when
making an allocation decision. This is deprecated behavior and will be
changed in 8.0. This setting can be set to `true` to enable the
disk watermarks for a single data node cluster (will become default in 8.0).
[[cluster-routing-flood_stage]]
`cluster.routing.allocation.disk.watermark.flood_stage`::
+
--
Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block
(`index.blocks.read_only_allow_delete`) on every index that has one or more
shards allocated on the node, and that has at least one disk exceeding the flood
stage. This setting is a last resort to prevent nodes from running out of disk space.
The index block is automatically released when the disk utilization falls below
the high watermark.
NOTE: You cannot mix the usage of percentage values and byte values within
these settings. Either all values are set to percentage values, or all are set to byte
values. This enforcement is so that {es} can validate that the settings are internally
consistent, ensuring that the low disk threshold is less than the high disk
threshold, and the high disk threshold is less than the flood stage
threshold.
An example of resetting the read-only index block on the `twitter` index:
[source,console]
--------------------------------------------------
PUT /twitter/_settings
{
"index.blocks.read_only_allow_delete": null
}
--------------------------------------------------
// TEST[setup:twitter]
--
`cluster.info.update.interval`::
How often {es} should check on disk usage for each node in the
cluster. Defaults to `30s`.
NOTE: Percentage values refer to used disk space, while byte values refer to
free disk space. This can be confusing, since it flips the meaning of high and
low. For example, it makes sense to set the low watermark to 10gb and the high
watermark to 5gb, but not the other way around.
An example of updating the low watermark to at least 100 gigabytes free, a high
watermark of at least 50 gigabytes free, and a flood stage watermark of 10
gigabytes free, and updating the information about the cluster every minute:
[source,console]
--------------------------------------------------
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "100gb",
"cluster.routing.allocation.disk.watermark.high": "50gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
"cluster.info.update.interval": "1m"
}
}
--------------------------------------------------
{es} accounts for the future disk usage of ongoing shard relocations and
recoveries to help prevent these shard movements from breaching a watermark.
This mechanism may double-count some data that has already been relocated onto
a node. For instance, if a relocation of a 100GB shard is 90% complete then
{es} has copied 90GB of data onto the target node. This 90GB consumes disk
space and will be reflected in the node's disk usage statistics. However {es}
also treats the relocation as if it will consume another full 100GB in the
future, even though the shard may really only consume a further 10GB of space.
If the node's disks are close to a watermark then this may temporarily prevent
other shards from moving onto the same node. Eventually the relocation will
complete and then {es} will use the node's true disk usage statistics again.