elasticsearch/docs/reference/tab-widgets/troubleshooting/disk/decrease-data-node-disk-usage.asciidoc
2022-10-14 15:24:21 +02:00

140 lines
5.8 KiB
Text

// tag::cloud[]
**Use {kib}**
//tag::kibana-api-ex[]
. Log in to the {ess-console}[{ecloud} console].
+
. On the **Elasticsearch Service** panel, click the name of your deployment.
+
NOTE: If the name of your deployment is disabled your {kib} instances might be
unhealthy, in which case please contact https://support.elastic.co[Elastic Support].
If your deployment doesn't include {kib}, all you need to do is
{cloud}/ec-access-kibana.html[enable it first].
+
. Open your deployment's side navigation menu (placed under the Elastic logo in the upper left corner)
and go to **Stack Management > Index Management**.
. In the list of all your indices, click the `Replicas` column twice to sort the indices based on their number of
replicas starting with the one that has the most. Go through the indices and pick one by one the index with the
least importance and higher number of replicas.
+
WARNING: Reducing the replicas of an index can potentially reduce search throughput and data redundancy.
+
. For each index you chose, click on its name, then on the panel that appears click `Edit settings`, reduce the
value of the `index.number_of_replicas` to the desired value and then click `Save`.
+
[role="screenshot"]
image::images/troubleshooting/disk/reduce_replicas.png[Reducing replicas,align="center"]
+
. Continue this process until the cluster is healthy again.
// end::cloud[]
// tag::self-managed[]
In order to estimate how many replicas need to be removed, first you need to estimate the amount of disk space that
needs to be released.
. First, retrieve the relevant disk thresholds that will indicate how much space should be released. The
relevant thresholds are the <<cluster-routing-watermark-high, high watermark>> for all the tiers apart from the frozen
one and the <<cluster-routing-flood-stage-frozen, frozen flood stage watermark>> for the frozen tier. The following
example demonstrates disk shortage in the hot tier, so we will only retrieve the high watermark:
+
[source,console]
----
GET _cluster/settings?include_defaults&filter_path=*.cluster.routing.allocation.disk.watermark.high*
----
+
The response will look like this:
+
[source,console-result]
----
{
"defaults": {
"cluster": {
"routing": {
"allocation": {
"disk": {
"watermark": {
"high": "90%",
"high.max_headroom": "150GB"
}
}
}
}
}
}
}
----
// TEST[skip:illustration purposes only]
+
The above means that in order to resolve the disk shortage we need to either drop our disk usage below the 90% or have
more than 150GB available, read more on how this threshold works <<cluster-routing-watermark-high, here>>.
. The next step is to find out the current disk usage; this will indicate how much space should be freed. For simplicity,
our example has one node, but you can apply the same for every node over the relevant threshold.
+
[source,console]
----
GET _cat/allocation?v&s=disk.avail&h=node,disk.percent,disk.avail,disk.total,disk.used,disk.indices,shards
----
+
The response will look like this:
+
[source,console-result]
----
node disk.percent disk.avail disk.total disk.used disk.indices shards
instance-0000000000 91 4.6gb 35gb 31.1gb 29.9gb 111
----
// TEST[skip:illustration purposes only]
. The high watermark configuration indicates that the disk usage needs to drop below 90%. Consider allowing some
padding, so the node will not go over the threshold in the near future. In this example, let's release approximately 7GB.
. The next step is to list all the indices and choose which replicas to reduce.
+
NOTE: The following command orders the indices with descending number of replicas and primary store size. We do this to
help you choose which replicas to reduce under the assumption that the more replicas you have the smaller the risk if
you remove a copy and the bigger the replica the more space will be released. This does not take into consideration any
functional requirements, so please see it as a mere suggestion.
+
[source,console]
----
GET _cat/indices?v&s=rep:desc,pri.store.size:desc&h=health,index,pri,rep,store.size,pri.store.size
----
+
The response will look like:
+
[source,console-result]
----
health index pri rep store.size pri.store.size
green my_index 2 3 9.9gb 3.3gb
green my_other_index 2 3 1.8gb 470.3mb
green search-products 2 3 278.5kb 69.6kb
green logs-000001 1 0 7.7gb 7.7gb
----
// TEST[skip:illustration purposes only]
+
. In the list above we see that if we reduce the replicas to 1 of the indices `my_index` and `my_other_index` we will
release the required disk space. It is not necessary to reduce the replicas of `search-products` and `logs-000001` does
not have any replicas anyway. Reduce the replicas of one or more indices with the <<indices-update-settings,
index update settings API>>:
+
WARNING: Reducing the replicas of an index can potentially reduce search throughput and data redundancy.
+
[source,console]
----
PUT my_index,my_other_index/_settings
{
"index.number_of_replicas": 1
}
----
// TEST[skip:illustration purposes only]
// end::self-managed[]
IMPORTANT: After reducing the replicas please consider there are enough replicas to ensure your search
performance and reliability requirements. If not, at your earliest convenience (i) consider using
<<overview-index-lifecycle-management, Index Lifecycle Management>> to manage more efficiently the
retention of your timeseries data, or (ii) reduce the amount of data you have by disabling the `source` or removing
less important data, or (iii) increase your disk capacity.