mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
Today the docs say that the low watermark has no effect on any shards that have never been allocated, but this is confusing. Here "shard" means "replication group" not "shard copy" but this conflicts with the "never been allocated" qualifier since one allocates shard copies and not replication groups. This commit removes the misleading words. A newly-created replication group remains newly-created until one of its copies is assigned, which might be quite some time later, but it seems better to leave this implicit.
102 lines
4.6 KiB
Text
102 lines
4.6 KiB
Text
[[disk-allocator]]
|
|
=== Disk-based shard allocation
|
|
|
|
Elasticsearch considers the available disk space on a node before deciding
|
|
whether to allocate new shards to that node or to actively relocate shards away
|
|
from that node.
|
|
|
|
Below are the settings that can be configured in the `elasticsearch.yml` config
|
|
file or updated dynamically on a live cluster with the
|
|
<<cluster-update-settings,cluster-update-settings>> API:
|
|
|
|
`cluster.routing.allocation.disk.threshold_enabled`::
|
|
|
|
Defaults to `true`. Set to `false` to disable the disk allocation decider.
|
|
|
|
`cluster.routing.allocation.disk.watermark.low`::
|
|
|
|
Controls the low watermark for disk usage. It defaults to `85%`, meaning
|
|
that Elasticsearch will not allocate shards to nodes that have more than
|
|
85% disk used. It can also be set to an absolute byte value (like `500mb`)
|
|
to prevent Elasticsearch from allocating shards if less than the specified
|
|
amount of space is available. This setting has no effect on the primary
|
|
shards of newly-created indices but will prevent their replicas from being allocated.
|
|
|
|
`cluster.routing.allocation.disk.watermark.high`::
|
|
|
|
Controls the high watermark. It defaults to `90%`, meaning that
|
|
Elasticsearch will attempt to relocate shards away from a node whose disk
|
|
usage is above 90%. It can also be set to an absolute byte value (similarly
|
|
to the low watermark) to relocate shards away from a node if it has less
|
|
than the specified amount of free space. This setting affects the
|
|
allocation of all shards, whether previously allocated or not.
|
|
|
|
`cluster.routing.allocation.disk.watermark.flood_stage`::
|
|
+
|
|
--
|
|
Controls the flood stage watermark. It defaults to 95%, meaning that
|
|
Elasticsearch enforces a read-only index block
|
|
(`index.blocks.read_only_allow_delete`) on every index that has one or more
|
|
shards allocated on the node that has at least one disk exceeding the flood
|
|
stage. This is a last resort to prevent nodes from running out of disk space.
|
|
The index block is automatically released once the disk utilization falls below
|
|
the high watermark.
|
|
|
|
NOTE: You can not mix the usage of percentage values and byte values within
|
|
these settings. Either all are set to percentage values, or all are set to byte
|
|
values. This is so that we can we validate that the settings are internally
|
|
consistent (that is, the low disk threshold is not more than the high disk
|
|
threshold, and the high disk threshold is not more than the flood stage
|
|
threshold).
|
|
|
|
An example of resetting the read-only index block on the `twitter` index:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /twitter/_settings
|
|
{
|
|
"index.blocks.read_only_allow_delete": null
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:twitter]
|
|
--
|
|
|
|
`cluster.info.update.interval`::
|
|
|
|
How often Elasticsearch should check on disk usage for each node in the
|
|
cluster. Defaults to `30s`.
|
|
|
|
NOTE: Percentage values refer to used disk space, while byte values refer to
|
|
free disk space. This can be confusing, since it flips the meaning of high and
|
|
low. For example, it makes sense to set the low watermark to 10gb and the high
|
|
watermark to 5gb, but not the other way around.
|
|
|
|
An example of updating the low watermark to at least 100 gigabytes free, a high
|
|
watermark of at least 50 gigabytes free, and a flood stage watermark of 10
|
|
gigabytes free, and updating the information about the cluster every minute:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"transient": {
|
|
"cluster.routing.allocation.disk.watermark.low": "100gb",
|
|
"cluster.routing.allocation.disk.watermark.high": "50gb",
|
|
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
|
|
"cluster.info.update.interval": "1m"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
{es} accounts for the future disk usage of ongoing shard relocations and
|
|
recoveries to help prevent these shard movements from breaching a watermark.
|
|
This mechanism may double-count some data that has already been relocated onto
|
|
a node. For instance, if a relocation of a 100GB shard is 90% complete then
|
|
{es} has copied 90GB of data onto the target node. This 90GB consumes disk
|
|
space and will be reflected in the node's disk usage statistics. However {es}
|
|
also treats the relocation as if it will consume another full 100GB in the
|
|
future, even though the shard may really only consume a further 10GB of space.
|
|
If the node's disks are close to a watermark then this may temporarily prevent
|
|
other shards from moving onto the same node. Eventually the relocation will
|
|
complete and then {es} will use the node's true disk usage statistics again.
|
|
|