[8.x] [DOCS] Concept cleanup 2 - ES settings (#119373) (#119642)

This commit is contained in:
shainaraskas 2025-01-10 10:31:16 -05:00 committed by GitHub
parent 8a14c1468d
commit ae3db6042a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
51 changed files with 959 additions and 856 deletions

View file

@ -241,7 +241,7 @@ The `discovery-ec2` plugin can automatically set the `aws_availability_zone`
node attribute to the availability zone of each node. This node attribute node attribute to the availability zone of each node. This node attribute
allows you to ensure that each shard has copies allocated redundantly across allows you to ensure that each shard has copies allocated redundantly across
multiple availability zones by using the multiple availability zones by using the
{ref}/modules-cluster.html#shard-allocation-awareness[Allocation Awareness] {ref}/shard-allocation-awareness.html#[Allocation Awareness]
feature. feature.
In order to enable the automatic definition of the `aws_availability_zone` In order to enable the automatic definition of the `aws_availability_zone`
@ -333,7 +333,7 @@ labelled as `Moderate` or `Low`.
* It is a good idea to distribute your nodes across multiple * It is a good idea to distribute your nodes across multiple
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html[availability https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html[availability
zones] and use {ref}/modules-cluster.html#shard-allocation-awareness[shard zones] and use {ref}/shard-allocation-awareness.html[shard
allocation awareness] to ensure that each shard has copies in more than one allocation awareness] to ensure that each shard has copies in more than one
availability zone. availability zone.

View file

@ -17,7 +17,7 @@ console. They are _not_ intended for use by applications. For application
consumption, use the <<cluster-nodes-info,nodes info API>>. consumption, use the <<cluster-nodes-info,nodes info API>>.
==== ====
Returns information about custom node attributes. Returns information about <<custom-node-attributes,custom node attributes>>.
[[cat-nodeattrs-api-request]] [[cat-nodeattrs-api-request]]
==== {api-request-title} ==== {api-request-title}

View file

@ -35,7 +35,7 @@ one of the following:
master-eligible nodes, all data nodes, all ingest nodes, all voting-only master-eligible nodes, all data nodes, all ingest nodes, all voting-only
nodes, all machine learning nodes, and all coordinating-only nodes. nodes, all machine learning nodes, and all coordinating-only nodes.
* a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`, * a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`,
which adds to the subset all nodes with a custom node attribute whose name which adds to the subset all nodes with a <<custom-node-attributes,custom node attribute>> whose name
and value match the respective patterns. Custom node attributes are and value match the respective patterns. Custom node attributes are
configured by setting properties in the configuration file of the form configured by setting properties in the configuration file of the form
`node.attr.attrname: attrvalue`. `node.attr.attrname: attrvalue`.

View file

@ -23,8 +23,8 @@ bin/elasticsearch-node repurpose|unsafe-bootstrap|detach-cluster|override-versio
This tool has a number of modes: This tool has a number of modes:
* `elasticsearch-node repurpose` can be used to delete unwanted data from a * `elasticsearch-node repurpose` can be used to delete unwanted data from a
node if it used to be a <<data-node,data node>> or a node if it used to be a <<data-node-role,data node>> or a
<<master-node,master-eligible node>> but has been repurposed not to have one <<master-node-role,master-eligible node>> but has been repurposed not to have one
or other of these roles. or other of these roles.
* `elasticsearch-node remove-settings` can be used to remove persistent settings * `elasticsearch-node remove-settings` can be used to remove persistent settings

View file

@ -43,7 +43,7 @@ Data older than this period can be deleted by {es} at a later time.
**Elastic Curator** is a tool that allows you to manage your indices and snapshots using user-defined filters and predefined actions. If ILM provides the functionality to manage your index lifecycle, and you have at least a Basic license, consider using ILM in place of Curator. Many stack components make use of ILM by default. {curator-ref-current}/ilm.html[Learn more]. **Elastic Curator** is a tool that allows you to manage your indices and snapshots using user-defined filters and predefined actions. If ILM provides the functionality to manage your index lifecycle, and you have at least a Basic license, consider using ILM in place of Curator. Many stack components make use of ILM by default. {curator-ref-current}/ilm.html[Learn more].
NOTE: <<xpack-rollup,Data rollup>> is a deprecated Elasticsearch feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments. NOTE: <<xpack-rollup,Data rollup>> is a deprecated {es} feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments.
[TIP] [TIP]
==== ====

View file

@ -2,7 +2,7 @@
[[migrate-index-allocation-filters]] [[migrate-index-allocation-filters]]
== Migrate index allocation filters to node roles == Migrate index allocation filters to node roles
If you currently use custom node attributes and If you currently use <<custom-node-attributes,custom node attributes>> and
<<shard-allocation-filtering, attribute-based allocation filters>> to <<shard-allocation-filtering, attribute-based allocation filters>> to
move indices through <<data-tiers, data tiers>> in a move indices through <<data-tiers, data tiers>> in a
https://www.elastic.co/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management[hot-warm-cold architecture], https://www.elastic.co/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management[hot-warm-cold architecture],

View file

@ -9,10 +9,16 @@ from any node.
The topics in this section provides information about the architecture of {es} and how it stores and retrieves data: The topics in this section provides information about the architecture of {es} and how it stores and retrieves data:
* <<nodes-shards,Nodes and shards>>: Learn about the basic building blocks of an {es} cluster, including nodes, shards, primaries, and replicas. * <<nodes-shards,Nodes and shards>>: Learn about the basic building blocks of an {es} cluster, including nodes, shards, primaries, and replicas.
* <<node-roles-overview,Node roles>>: Learn about the different roles that nodes can have in an {es} cluster.
* <<docs-replication,Reading and writing documents>>: Learn how {es} replicates read and write operations across shards and shard copies. * <<docs-replication,Reading and writing documents>>: Learn how {es} replicates read and write operations across shards and shard copies.
* <<shard-allocation-relocation-recovery,Shard allocation, relocation, and recovery>>: Learn how {es} allocates and balances shards across nodes. * <<shard-allocation-relocation-recovery,Shard allocation, relocation, and recovery>>: Learn how {es} allocates and balances shards across nodes.
** <<shard-allocation-awareness,Shard allocation awareness>>: Learn how to use custom node attributes to distribute shards across different racks or availability zones.
* <<shard-request-cache,Shard request cache>>: Learn how {es} caches search requests to improve performance.
-- --
include::nodes-shards.asciidoc[] include::nodes-shards.asciidoc[]
include::node-roles.asciidoc[]
include::docs/data-replication.asciidoc[leveloffset=-1] include::docs/data-replication.asciidoc[leveloffset=-1]
include::modules/shard-ops.asciidoc[] include::modules/shard-ops.asciidoc[]
include::modules/cluster/allocation_awareness.asciidoc[leveloffset=+1]
include::shard-request-cache.asciidoc[leveloffset=-1]

View file

@ -72,6 +72,45 @@ the granularity of `cold` archival data to monthly or less.
.Downsampled metrics series .Downsampled metrics series
image::images/data-streams/time-series-downsampled.png[align="center"] image::images/data-streams/time-series-downsampled.png[align="center"]
[discrete]
[[downsample-api-process]]
==== The downsampling process
The downsampling operation traverses the source TSDS index and performs the
following steps:
. Creates a new document for each value of the `_tsid` field and each
`@timestamp` value, rounded to the `fixed_interval` defined in the downsample
configuration.
. For each new document, copies all <<time-series-dimension,time
series dimensions>> from the source index to the target index. Dimensions in a
TSDS are constant, so this is done only once per bucket.
. For each <<time-series-metric,time series metric>> field, computes aggregations
for all documents in the bucket. Depending on the metric type of each metric
field a different set of pre-aggregated results is stored:
** `gauge`: The `min`, `max`, `sum`, and `value_count` are stored; `value_count`
is stored as type `aggregate_metric_double`.
** `counter`: The `last_value` is stored.
. For all other fields, the most recent value is copied to the target index.
[discrete]
[[downsample-api-mappings]]
==== Source and target index field mappings
Fields in the target, downsampled index are created based on fields in the
original source index, as follows:
. All fields mapped with the `time-series-dimension` parameter are created in
the target downsample index with the same mapping as in the source index.
. All fields mapped with the `time_series_metric` parameter are created
in the target downsample index with the same mapping as in the source
index. An exception is that for fields mapped as `time_series_metric: gauge`
the field type is changed to `aggregate_metric_double`.
. All other fields that are neither dimensions nor metrics (that is, label
fields), are created in the target downsample index with the same mapping
that they had in the source index.
[discrete] [discrete]
[[running-downsampling]] [[running-downsampling]]
=== Running downsampling on time series data === Running downsampling on time series data

View file

@ -190,7 +190,7 @@ tier].
[[configure-data-tiers-on-premise]] [[configure-data-tiers-on-premise]]
==== Self-managed deployments ==== Self-managed deployments
For self-managed deployments, each node's <<data-node,data role>> is configured For self-managed deployments, each node's <<data-node-role,data role>> is configured
in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster
might be assigned to both the hot and content tiers: might be assigned to both the hot and content tiers:

View file

@ -87,7 +87,7 @@ the same thing, but it's not necessary to use this feature in such a small
cluster. cluster.
We recommend you set only one of your two nodes to be We recommend you set only one of your two nodes to be
<<master-node,master-eligible>>. This means you can be certain which of your <<master-node-role,master-eligible>>. This means you can be certain which of your
nodes is the elected master of the cluster. The cluster can tolerate the loss of nodes is the elected master of the cluster. The cluster can tolerate the loss of
the other master-ineligible node. If you set both nodes to master-eligible, two the other master-ineligible node. If you set both nodes to master-eligible, two
nodes are required for a master election. Since the election will fail if either nodes are required for a master election. Since the election will fail if either
@ -164,12 +164,12 @@ cluster that is suitable for production deployments.
[[high-availability-cluster-design-three-nodes]] [[high-availability-cluster-design-three-nodes]]
==== Three-node clusters ==== Three-node clusters
If you have three nodes, we recommend they all be <<data-node,data nodes>> and If you have three nodes, we recommend they all be <<data-node-role,data nodes>> and
every index that is not a <<searchable-snapshots,searchable snapshot index>> every index that is not a <<searchable-snapshots,searchable snapshot index>>
should have at least one replica. Nodes are data nodes by default. You may should have at least one replica. Nodes are data nodes by default. You may
prefer for some indices to have two replicas so that each node has a copy of prefer for some indices to have two replicas so that each node has a copy of
each shard in those indices. You should also configure each node to be each shard in those indices. You should also configure each node to be
<<master-node,master-eligible>> so that any two of them can hold a master <<master-node-role,master-eligible>> so that any two of them can hold a master
election without needing to communicate with the third node. Nodes are election without needing to communicate with the third node. Nodes are
master-eligible by default. This cluster will be resilient to the loss of any master-eligible by default. This cluster will be resilient to the loss of any
single node. single node.
@ -188,8 +188,8 @@ service provides such a load balancer.
Once your cluster grows to more than three nodes, you can start to specialise Once your cluster grows to more than three nodes, you can start to specialise
these nodes according to their responsibilities, allowing you to scale their these nodes according to their responsibilities, allowing you to scale their
resources independently as needed. You can have as many <<data-node,data resources independently as needed. You can have as many <<data-node-role,data
nodes>>, <<ingest,ingest nodes>>, <<ml-node,{ml} nodes>>, etc. as needed to nodes>>, <<ingest,ingest nodes>>, <<ml-node-role,{ml} nodes>>, etc. as needed to
support your workload. As your cluster grows larger, we recommend using support your workload. As your cluster grows larger, we recommend using
dedicated nodes for each role. This allows you to independently scale resources dedicated nodes for each role. This allows you to independently scale resources
for each task. for each task.

View file

@ -11,7 +11,7 @@
For the most up-to-date API details, refer to {api-es}/group/endpoint-ilm[{ilm-cap} APIs]. For the most up-to-date API details, refer to {api-es}/group/endpoint-ilm[{ilm-cap} APIs].
-- --
Switches the indices, ILM policies, and legacy, composable and component templates from using custom node attributes and Switches the indices, ILM policies, and legacy, composable and component templates from using <<custom-node-attributes,custom node attributes>> and
<<shard-allocation-filtering, attribute-based allocation filters>> to using <<data-tiers, data tiers>>, and <<shard-allocation-filtering, attribute-based allocation filters>> to using <<data-tiers, data tiers>>, and
optionally deletes one legacy index template. optionally deletes one legacy index template.
Using node roles enables {ilm-init} to <<data-tier-migration, automatically move the indices>> between Using node roles enables {ilm-init} to <<data-tier-migration, automatically move the indices>> between

View file

@ -13,7 +13,7 @@ This setting corresponds to the data node roles:
* <<data-cold-node, data_cold>> * <<data-cold-node, data_cold>>
* <<data-frozen-node, data_frozen>> * <<data-frozen-node, data_frozen>>
NOTE: The <<data-node, data>> role is not a valid data tier and cannot be used NOTE: The <<data-node-role, data>> role is not a valid data tier and cannot be used
with the `_tier_preference` setting. The frozen tier stores <<partially-mounted,partially with the `_tier_preference` setting. The frozen tier stores <<partially-mounted,partially
mounted indices>> exclusively. mounted indices>> exclusively.

View file

@ -6,7 +6,7 @@ a particular index. These per-index filters are applied in conjunction with
<<cluster-shard-allocation-filtering, cluster-wide allocation filtering>> and <<cluster-shard-allocation-filtering, cluster-wide allocation filtering>> and
<<shard-allocation-awareness, allocation awareness>>. <<shard-allocation-awareness, allocation awareness>>.
Shard allocation filters can be based on custom node attributes or the built-in Shard allocation filters can be based on <<custom-node-attributes,custom node attributes>> or the built-in
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference` `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference`
attributes. <<index-lifecycle-management, Index lifecycle management>> uses filters based attributes. <<index-lifecycle-management, Index lifecycle management>> uses filters based
on custom node attributes to determine how to reallocate shards when moving on custom node attributes to determine how to reallocate shards when moving
@ -114,7 +114,7 @@ The index allocation settings support the following built-in attributes:
NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
a subset of roles are <<data-tiers, data tier>> roles, and the generic a subset of roles are <<data-tiers, data tier>> roles, and the generic
<<data-node, data role>> will match any tier filtering. <<data-node-role, data role>> will match any tier filtering.
You can use wildcards when specifying attribute values, for example: You can use wildcards when specifying attribute values, for example:

View file

@ -81,6 +81,8 @@ DELETE _index_template/*
//// ////
// end::downsample-example[] // end::downsample-example[]
Check the <<downsampling,Downsampling>> documentation for an overview, details about the downsampling process, and examples of running downsampling manually and as part of an ILM policy.
[[downsample-api-request]] [[downsample-api-request]]
==== {api-request-title} ==== {api-request-title}
@ -122,43 +124,3 @@ document for each 60 minute (hourly) interval. This follows standard time
formatting syntax as used elsewhere in {es}. formatting syntax as used elsewhere in {es}.
+ +
NOTE: Smaller, more granular intervals take up proportionally more space. NOTE: Smaller, more granular intervals take up proportionally more space.
[[downsample-api-process]]
==== The downsampling process
The downsampling operation traverses the source TSDS index and performs the
following steps:
. Creates a new document for each value of the `_tsid` field and each
`@timestamp` value, rounded to the `fixed_interval` defined in the downsample
configuration.
. For each new document, copies all <<time-series-dimension,time
series dimensions>> from the source index to the target index. Dimensions in a
TSDS are constant, so this is done only once per bucket.
. For each <<time-series-metric,time series metric>> field, computes aggregations
for all documents in the bucket. Depending on the metric type of each metric
field a different set of pre-aggregated results is stored:
** `gauge`: The `min`, `max`, `sum`, and `value_count` are stored; `value_count`
is stored as type `aggregate_metric_double`.
** `counter`: The `last_value` is stored.
. For all other fields, the most recent value is copied to the target index.
[[downsample-api-mappings]]
==== Source and target index field mappings
Fields in the target, downsampled index are created based on fields in the
original source index, as follows:
. All fields mapped with the `time-series-dimension` parameter are created in
the target downsample index with the same mapping as in the source index.
. All fields mapped with the `time_series_metric` parameter are created
in the target downsample index with the same mapping as in the source
index. An exception is that for fields mapped as `time_series_metric: gauge`
the field type is changed to `aggregate_metric_double`.
. All other fields that are neither dimensions nor metrics (that is, label
fields), are created in the target downsample index with the same mapping
that they had in the source index.
Check the <<downsampling,Downsampling>> documentation for an overview and
examples of running downsampling manually and as part of an ILM policy.

View file

@ -27,7 +27,23 @@ include::cluster/shards_allocation.asciidoc[]
include::cluster/disk_allocator.asciidoc[] include::cluster/disk_allocator.asciidoc[]
include::cluster/allocation_awareness.asciidoc[] [[shard-allocation-awareness-settings]]
==== Shard allocation awareness settings
You can use <<custom-node-attributes,custom node attributes>> as _awareness attributes_ to enable {es}
to take your physical hardware configuration into account when allocating shards.
If {es} knows which nodes are on the same physical server, in the same rack, or
in the same zone, it can distribute the primary shard and its replica shards to
minimize the risk of losing all shard copies in the event of a failure. <<shard-allocation-awareness,Learn more about shard allocation awareness>>.
`cluster.routing.allocation.awareness.attributes`::
(<<dynamic-cluster-setting,Dynamic>>)
The node attributes that {es} should use as awareness attributes. For example, if you have a `rack_id` attribute that specifies the rack in which each node resides, you can set this setting to `rack_id` to ensure that primary and replica shards are not allocated on the same rack. You can specify multiple attributes as a comma-separated list.
`cluster.routing.allocation.awareness.force.*`::
(<<dynamic-cluster-setting,Dynamic>>)
The shard allocation awareness values that must exist for shards to be reallocated in case of location failure. Learn more about <<forced-awareness,forced awareness>>.
include::cluster/allocation_filtering.asciidoc[] include::cluster/allocation_filtering.asciidoc[]

View file

@ -1,5 +1,5 @@
[[shard-allocation-awareness]] [[shard-allocation-awareness]]
==== Shard allocation awareness == Shard allocation awareness
You can use custom node attributes as _awareness attributes_ to enable {es} You can use custom node attributes as _awareness attributes_ to enable {es}
to take your physical hardware configuration into account when allocating shards. to take your physical hardware configuration into account when allocating shards.
@ -7,12 +7,7 @@ If {es} knows which nodes are on the same physical server, in the same rack, or
in the same zone, it can distribute the primary shard and its replica shards to in the same zone, it can distribute the primary shard and its replica shards to
minimize the risk of losing all shard copies in the event of a failure. minimize the risk of losing all shard copies in the event of a failure.
When shard allocation awareness is enabled with the When shard allocation awareness is enabled with the `cluster.routing.allocation.awareness.attributes` setting, shards are only allocated to nodes that have values set for the specified awareness attributes. If you use multiple awareness attributes, {es} considers each attribute separately when allocating shards.
<<dynamic-cluster-setting,dynamic>>
`cluster.routing.allocation.awareness.attributes` setting, shards are only
allocated to nodes that have values set for the specified awareness attributes.
If you use multiple awareness attributes, {es} considers each attribute
separately when allocating shards.
NOTE: The number of attribute values determines how many shard copies are NOTE: The number of attribute values determines how many shard copies are
allocated in each location. If the number of nodes in each location is allocated in each location. If the number of nodes in each location is
@ -22,11 +17,11 @@ unassigned.
TIP: Learn more about <<high-availability-cluster-design-large-clusters,designing resilient clusters>>. TIP: Learn more about <<high-availability-cluster-design-large-clusters,designing resilient clusters>>.
[[enabling-awareness]] [[enabling-awareness]]
===== Enabling shard allocation awareness === Enabling shard allocation awareness
To enable shard allocation awareness: To enable shard allocation awareness:
. Specify the location of each node with a custom node attribute. For example, . Specify the location of each node with a <<custom-node-attributes,custom node attribute>>. For example,
if you want Elasticsearch to distribute shards across different racks, you might if you want Elasticsearch to distribute shards across different racks, you might
use an awareness attribute called `rack_id`. use an awareness attribute called `rack_id`.
+ +
@ -94,7 +89,7 @@ copies of a particular shard from being allocated in the same location, you can
enable forced awareness. enable forced awareness.
[[forced-awareness]] [[forced-awareness]]
===== Forced awareness === Forced awareness
By default, if one location fails, {es} spreads its shards across the remaining By default, if one location fails, {es} spreads its shards across the remaining
locations. This might be undesirable if the cluster does not have sufficient locations. This might be undesirable if the cluster does not have sufficient

View file

@ -6,7 +6,7 @@ allocates shards from any index. These cluster wide filters are applied in
conjunction with <<shard-allocation-filtering, per-index allocation filtering>> conjunction with <<shard-allocation-filtering, per-index allocation filtering>>
and <<shard-allocation-awareness, allocation awareness>>. and <<shard-allocation-awareness, allocation awareness>>.
Shard allocation filters can be based on custom node attributes or the built-in Shard allocation filters can be based on <<custom-node-attributes,custom node attributes>> or the built-in
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id` and `_tier` attributes. `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id` and `_tier` attributes.
The `cluster.routing.allocation` settings are <<dynamic-cluster-setting,dynamic>>, enabling live indices to The `cluster.routing.allocation` settings are <<dynamic-cluster-setting,dynamic>>, enabling live indices to
@ -59,9 +59,9 @@ The cluster allocation settings support the following built-in attributes:
NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
a subset of roles are <<data-tiers, data tier>> roles, and the generic a subset of roles are <<data-tiers, data tier>> roles, and the generic
<<data-node, data role>> will match any tier filtering. <<data-node-role, data role>> will match any tier filtering.
a subset of roles that are <<data-tiers, data tier>> roles, but the generic a subset of roles that are <<data-tiers, data tier>> roles, but the generic
<<data-node, data role>> will match any tier filtering. <<data-node-role, data role>> will match any tier filtering.
You can use wildcards when specifying attribute values, for example: You can use wildcards when specifying attribute values, for example:

View file

@ -41,6 +41,23 @@ on the affected node drops below the high watermark, {es} automatically removes
the write block. Refer to <<fix-watermark-errors,Fix watermark errors>> to the write block. Refer to <<fix-watermark-errors,Fix watermark errors>> to
resolve persistent watermark errors. resolve persistent watermark errors.
[NOTE]
.Max headroom settings
===================================================
Max headroom settings apply only when watermark settings are percentages or ratios.
A max headroom value is intended to cap the required free disk space before hitting
the respective watermark. This is useful for servers with larger disks, where a percentage or ratio watermark could translate to an overly large free disk space requirement. In this case, the max headroom can be used to cap the required free disk space amount.
For example, where `cluster.routing.allocation.disk.watermark.flood_stage` is 95% and `cluster.routing.allocation.disk.watermark.flood_stage.max_headroom` is 100GB, this means that:
* For a smaller disk, e.g., of 100GB, the flood watermark will hit at 95%, meaning at 5GB of free space, since 5GB is smaller than the 100GB max headroom value.
* For a larger disk, e.g., of 100TB, the flood watermark will hit at 100GB of free space. That is because the 95% flood watermark alone would require 5TB of free disk space, but is capped by the max headroom setting to 100GB.
Max headroom settings have their default values only if their respective watermark settings are not explicitly set. If watermarks are explicitly set, then the max headroom settings do not have their default values, and need to be explicitly set if they are needed.
===================================================
[[disk-based-shard-allocation-does-not-balance]] [[disk-based-shard-allocation-does-not-balance]]
[TIP] [TIP]
==== ====
@ -100,18 +117,7 @@ is now `true`. The setting will be removed in a future release.
+ +
-- --
(<<dynamic-cluster-setting,Dynamic>>) (<<dynamic-cluster-setting,Dynamic>>)
Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block (`index.blocks.read_only_allow_delete`) on every index that has one or more shards allocated on the node, and that has at least one disk exceeding the flood stage. This setting is a last resort to prevent nodes from running out of disk space. The index block is automatically released when the disk utilization falls below the high watermark. Similarly to the low and high watermark values, it can alternatively be set to a ratio value, e.g., `0.95`, or an absolute byte value. Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block (<<index-block-settings,`index.blocks.read_only_allow_delete`>>) on every index that has one or more shards allocated on the node, and that has at least one disk exceeding the flood stage. This setting is a last resort to prevent nodes from running out of disk space. The index block is automatically released when the disk utilization falls below the high watermark. Similarly to the low and high watermark values, it can alternatively be set to a ratio value, e.g., `0.95`, or an absolute byte value.
An example of resetting the read-only index block on the `my-index-000001` index:
[source,console]
--------------------------------------------------
PUT /my-index-000001/_settings
{
"index.blocks.read_only_allow_delete": null
}
--------------------------------------------------
// TEST[setup:my_index]
-- --
// end::cluster-routing-flood-stage-tag[] // end::cluster-routing-flood-stage-tag[]
@ -121,10 +127,10 @@ Defaults to 100GB when
`cluster.routing.allocation.disk.watermark.flood_stage` is not explicitly set. `cluster.routing.allocation.disk.watermark.flood_stage` is not explicitly set.
This caps the amount of free space required. This caps the amount of free space required.
NOTE: You cannot mix the usage of percentage/ratio values and byte values across NOTE: You can't mix the usage of percentage/ratio values and byte values across
the `cluster.routing.allocation.disk.watermark.low`, `cluster.routing.allocation.disk.watermark.high`, the `cluster.routing.allocation.disk.watermark.low`, `cluster.routing.allocation.disk.watermark.high`,
and `cluster.routing.allocation.disk.watermark.flood_stage` settings. Either all values and `cluster.routing.allocation.disk.watermark.flood_stage` settings. Either all values
are set to percentage/ratio values, or all are set to byte values. This enforcement is must be set to percentage/ratio values, or all must be set to byte values. This is required
so that {es} can validate that the settings are internally consistent, ensuring that the so that {es} can validate that the settings are internally consistent, ensuring that the
low disk threshold is less than the high disk threshold, and the high disk threshold is low disk threshold is less than the high disk threshold, and the high disk threshold is
less than the flood stage threshold. A similar comparison check is done for the max less than the flood stage threshold. A similar comparison check is done for the max
@ -150,44 +156,6 @@ set. This caps the amount of free space required on dedicated frozen nodes.
cluster. Defaults to `30s`. cluster. Defaults to `30s`.
NOTE: Percentage values refer to used disk space, while byte values refer to NOTE: Percentage values refer to used disk space, while byte values refer to
free disk space. This can be confusing, since it flips the meaning of high and free disk space. This can be confusing, because it flips the meaning of high and
low. For example, it makes sense to set the low watermark to 10gb and the high low. For example, it makes sense to set the low watermark to 10gb and the high
watermark to 5gb, but not the other way around. watermark to 5gb, but not the other way around.
An example of updating the low watermark to at least 100 gigabytes free, a high
watermark of at least 50 gigabytes free, and a flood stage watermark of 10
gigabytes free, and updating the information about the cluster every minute:
[source,console]
--------------------------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "100gb",
"cluster.routing.allocation.disk.watermark.high": "50gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
"cluster.info.update.interval": "1m"
}
}
--------------------------------------------------
Concerning the max headroom settings for the watermarks, please note
that these apply only in the case that the watermark settings are percentages/ratios.
The aim of a max headroom value is to cap the required free disk space before hitting
the respective watermark. This is especially useful for servers with larger
disks, where a percentage/ratio watermark could translate to a big free disk space requirement,
and the max headroom can be used to cap the required free disk space amount.
As an example, let us take the default settings for the flood watermark.
It has a 95% default value, and the flood max headroom setting has a default value of 100GB.
This means that:
* For a smaller disk, e.g., of 100GB, the flood watermark will hit at 95%, meaning at 5GB
of free space, since 5GB is smaller than the 100GB max headroom value.
* For a larger disk, e.g., of 100TB, the flood watermark will hit at 100GB of free space.
That is because the 95% flood watermark alone would require 5TB of free disk space, but
that is capped by the max headroom setting to 100GB.
Finally, the max headroom settings have their default values only if their respective watermark
settings are not explicitly set (thus, they have their default percentage values).
If watermarks are explicitly set, then the max headroom settings do not have their default values,
and would need to be explicitly set if they are desired.

View file

@ -1,6 +1,9 @@
[[misc-cluster-settings]] [[misc-cluster-settings]]
=== Miscellaneous cluster settings === Miscellaneous cluster settings
[[cluster-name]]
include::{es-ref-dir}/setup/important-settings/cluster-name.asciidoc[]
[discrete] [discrete]
[[cluster-read-only]] [[cluster-read-only]]
==== Metadata ==== Metadata

View file

@ -2,7 +2,7 @@
=== Bootstrapping a cluster === Bootstrapping a cluster
Starting an Elasticsearch cluster for the very first time requires the initial Starting an Elasticsearch cluster for the very first time requires the initial
set of <<master-node,master-eligible nodes>> to be explicitly defined on one or set of <<master-node-role,master-eligible nodes>> to be explicitly defined on one or
more of the master-eligible nodes in the cluster. This is known as _cluster more of the master-eligible nodes in the cluster. This is known as _cluster
bootstrapping_. This is only required the first time a cluster starts up. bootstrapping_. This is only required the first time a cluster starts up.
Freshly-started nodes that are joining a running cluster obtain this Freshly-started nodes that are joining a running cluster obtain this

View file

@ -1,5 +1,23 @@
[[cluster-state-overview]]
=== Cluster state
The _cluster state_ is an internal data structure which keeps track of a
variety of information needed by every node, including:
* The identity and attributes of the other nodes in the cluster
* Cluster-wide settings
* Index metadata, including the mapping and settings for each index
* The location and status of every shard copy in the cluster
The elected master node ensures that every node in the cluster has a copy of
the same cluster state. The <<cluster-state,cluster state API>> lets you retrieve a
representation of this internal state for debugging or diagnostic purposes.
[[cluster-state-publishing]] [[cluster-state-publishing]]
=== Publishing the cluster state ==== Publishing the cluster state
The elected master node is the only node in a cluster that can make changes to The elected master node is the only node in a cluster that can make changes to
the cluster state. The elected master node processes one batch of cluster state the cluster state. The elected master node processes one batch of cluster state
@ -58,3 +76,16 @@ speed of the storage on each master-eligible node, as well as the reliability
and latency of the network interconnections between all nodes in the cluster. and latency of the network interconnections between all nodes in the cluster.
You must therefore ensure that the storage and networking available to the You must therefore ensure that the storage and networking available to the
nodes in your cluster are good enough to meet your performance goals. nodes in your cluster are good enough to meet your performance goals.
[[dangling-index]]
==== Dangling indices
When a node joins the cluster, if it finds any shards stored in its local
data directory that do not already exist in the cluster state, it will consider
those shards to belong to a "dangling" index. You can list, import or
delete dangling indices using the <<dangling-indices-api,Dangling indices
API>>.
NOTE: The API cannot offer any guarantees as to whether the imported data
truly represents the latest state of the data when the index was still part
of the cluster.

View file

@ -2,7 +2,7 @@
=== Voting configurations === Voting configurations
Each {es} cluster has a _voting configuration_, which is the set of Each {es} cluster has a _voting configuration_, which is the set of
<<master-node,master-eligible nodes>> whose responses are counted when making <<master-node-role,master-eligible nodes>> whose responses are counted when making
decisions such as electing a new master or committing a new cluster state. decisions such as electing a new master or committing a new cluster state.
Decisions are made only after a majority (more than half) of the nodes in the Decisions are made only after a majority (more than half) of the nodes in the
voting configuration respond. voting configuration respond.

View file

@ -1,10 +1,11 @@
[[modules-gateway]] [[modules-gateway]]
=== Local gateway settings === Local gateway settings
[[dangling-indices]]
The local gateway stores the cluster state and shard data across full The local gateway stores the cluster state and shard data across full
cluster restarts. cluster restarts.
The following _static_ settings, which must be set on every <<master-node,master-eligible node>>, The following _static_ settings, which must be set on every <<master-node-role,master-eligible node>>,
control how long a freshly elected master should wait before it tries to control how long a freshly elected master should wait before it tries to
recover the <<cluster-state,cluster state>> and the cluster's data. recover the <<cluster-state,cluster state>> and the cluster's data.
@ -37,16 +38,3 @@ gateway.expected_data_nodes: 3
gateway.recover_after_time: 600s gateway.recover_after_time: 600s
gateway.recover_after_data_nodes: 3 gateway.recover_after_data_nodes: 3
-------------------------------------------------- --------------------------------------------------
[[dangling-indices]]
==== Dangling indices
When a node joins the cluster, if it finds any shards stored in its local
data directory that do not already exist in the cluster, it will consider
those shards to belong to a "dangling" index. You can list, import or
delete dangling indices using the <<dangling-indices-api,Dangling indices
API>>.
NOTE: The API cannot offer any guarantees as to whether the imported data
truly represents the latest state of the data when the index was still part
of the cluster.

View file

@ -5,10 +5,6 @@ The field data cache contains <<fielddata-mapping-param, field data>> and <<eage
which are both used to support aggregations on certain field types. which are both used to support aggregations on certain field types.
Since these are on-heap data structures, it is important to monitor the cache's use. Since these are on-heap data structures, it is important to monitor the cache's use.
[discrete]
[[fielddata-sizing]]
==== Cache size
The entries in the cache are expensive to build, so the default behavior is The entries in the cache are expensive to build, so the default behavior is
to keep the cache loaded in memory. The default cache size is unlimited, to keep the cache loaded in memory. The default cache size is unlimited,
causing the cache to grow until it reaches the limit set by the <<fielddata-circuit-breaker, field data circuit breaker>>. This behavior can be configured. causing the cache to grow until it reaches the limit set by the <<fielddata-circuit-breaker, field data circuit breaker>>. This behavior can be configured.
@ -20,16 +16,12 @@ at the cost of rebuilding the cache as needed.
If the circuit breaker limit is reached, further requests that increase the cache If the circuit breaker limit is reached, further requests that increase the cache
size will be prevented. In this case you should manually <<indices-clearcache, clear the cache>>. size will be prevented. In this case you should manually <<indices-clearcache, clear the cache>>.
TIP: You can monitor memory usage for field data as well as the field data circuit
breaker using
the <<cluster-nodes-stats,nodes stats API>> or the <<cat-fielddata,cat fielddata API>>.
`indices.fielddata.cache.size`:: `indices.fielddata.cache.size`::
(<<static-cluster-setting,Static>>) (<<static-cluster-setting,Static>>)
The max size of the field data cache, eg `38%` of node heap space, or an The max size of the field data cache, eg `38%` of node heap space, or an
absolute value, eg `12GB`. Defaults to unbounded. If you choose to set it, absolute value, eg `12GB`. Defaults to unbounded. If you choose to set it,
it should be smaller than <<fielddata-circuit-breaker>> limit. it should be smaller than <<fielddata-circuit-breaker>> limit.
[discrete]
[[fielddata-monitoring]]
==== Monitoring field data
You can monitor memory usage for field data as well as the field data circuit
breaker using
the <<cluster-nodes-stats,nodes stats API>> or the <<cat-fielddata,cat fielddata API>>.

View file

@ -1,4 +1,4 @@
[[shard-request-cache]] [[shard-request-cache-settings]]
=== Shard request cache settings === Shard request cache settings
When a search request is run against an index or against many indices, each When a search request is run against an index or against many indices, each
@ -10,139 +10,16 @@ The shard-level request cache module caches the local results on each shard.
This allows frequently used (and potentially heavy) search requests to return This allows frequently used (and potentially heavy) search requests to return
results almost instantly. The requests cache is a very good fit for the logging results almost instantly. The requests cache is a very good fit for the logging
use case, where only the most recent index is being actively updated -- use case, where only the most recent index is being actively updated --
results from older indices will be served directly from the cache. results from older indices will be served directly from the cache. You can use shard request cache settings to control the size and expiration of the cache.
[IMPORTANT] To learn more about the shard request cache, see <<shard-request-cache>>.
===================================
By default, the requests cache will only cache the results of search requests
where `size=0`, so it will not cache `hits`,
but it will cache `hits.total`, <<search-aggregations,aggregations>>, and
<<search-suggesters,suggestions>>.
Most queries that use `now` (see <<date-math>>) cannot be cached.
Scripted queries that use the API calls which are non-deterministic, such as
`Math.random()` or `new Date()` are not cached.
===================================
[discrete]
==== Cache invalidation
The cache is smart -- it keeps the same _near real-time_ promise as uncached
search.
Cached results are invalidated automatically whenever the shard refreshes to
pick up changes to the documents or when you update the mapping. In other
words you will always get the same results from the cache as you would for an
uncached search request.
The longer the refresh interval, the longer that cached entries will remain
valid even if there are changes to the documents. If the cache is full, the
least recently used cache keys will be evicted.
The cache can be expired manually with the <<indices-clearcache,`clear-cache` API>>:
[source,console]
------------------------
POST /my-index-000001,my-index-000002/_cache/clear?request=true
------------------------
// TEST[s/^/PUT my-index-000001\nPUT my-index-000002\n/]
[discrete]
==== Enabling and disabling caching
The cache is enabled by default, but can be disabled when creating a new
index as follows:
[source,console]
-----------------------------
PUT /my-index-000001
{
"settings": {
"index.requests.cache.enable": false
}
}
-----------------------------
It can also be enabled or disabled dynamically on an existing index with the
<<indices-update-settings,`update-settings`>> API:
[source,console]
-----------------------------
PUT /my-index-000001/_settings
{ "index.requests.cache.enable": true }
-----------------------------
// TEST[continued]
[discrete]
==== Enabling and disabling caching per request
The `request_cache` query-string parameter can be used to enable or disable
caching on a *per-request* basis. If set, it overrides the index-level setting:
[source,console]
-----------------------------
GET /my-index-000001/_search?request_cache=true
{
"size": 0,
"aggs": {
"popular_colors": {
"terms": {
"field": "colors"
}
}
}
}
-----------------------------
// TEST[continued]
Requests where `size` is greater than 0 will not be cached even if the request cache is
enabled in the index settings. To cache these requests you will need to use the
query-string parameter detailed here.
[discrete]
==== Cache key
A hash of the whole JSON body is used as the cache key. This means that if the JSON
changes -- for instance if keys are output in a different order -- then the
cache key will not be recognised.
TIP: Most JSON libraries support a _canonical_ mode which ensures that JSON
keys are always emitted in the same order. This canonical mode can be used in
the application to ensure that a request is always serialized in the same way.
[discrete] [discrete]
==== Cache settings ==== Cache settings
The cache is managed at the node level, and has a default maximum size of `1%` `indices.requests.cache.size`::
of the heap. This can be changed in the `config/elasticsearch.yml` file with: (<<static-cluster-setting,Static>>) The maximum size of the cache, as a percentage of the heap. Default: `1%`.
[source,yaml] `indices.requests.cache.expire`::
-------------------------------- (<<static-cluster-setting,Static>>) The TTL for cached results. Stale results are automatically invalidated when the index is refreshed, so you shouldn't need to use this setting.
indices.requests.cache.size: 2%
--------------------------------
Also, you can use the +indices.requests.cache.expire+ setting to specify a TTL
for cached results, but there should be no reason to do so. Remember that
stale results are automatically invalidated when the index is refreshed. This
setting is provided for completeness' sake only.
[discrete]
==== Monitoring cache usage
The size of the cache (in bytes) and the number of evictions can be viewed
by index, with the <<indices-stats,`indices-stats`>> API:
[source,console]
------------------------
GET /_stats/request_cache?human
------------------------
or by node with the <<cluster-nodes-stats,`nodes-stats`>> API:
[source,console]
------------------------
GET /_nodes/stats/indices/request_cache?human
------------------------

View file

@ -286,3 +286,22 @@ include::remote-cluster-network.asciidoc[]
include::network/tracers.asciidoc[] include::network/tracers.asciidoc[]
include::network/threading.asciidoc[] include::network/threading.asciidoc[]
[[readiness-tcp-port]]
==== TCP readiness port
preview::[]
If configured, a node can open a TCP port when the node is in a ready state. A node is deemed
ready when it has successfully joined a cluster. In a single node configuration, the node is
said to be ready, when it's able to accept requests.
To enable the readiness TCP port, use the `readiness.port` setting. The readiness service will bind to
all host addresses.
If the node leaves the cluster, or the <<put-shutdown,Shutdown API>> is used to mark the node
for shutdown, the readiness port is immediately closed.
A successful connection to the readiness TCP port signals that the {es} node is ready. When a client
connects to the readiness port, the server simply terminates the socket connection. No data is sent back
to the client. If a client cannot connect to the readiness port, the node is not ready.

View file

@ -1,5 +1,5 @@
[[modules-node]] [[modules-node]]
=== Nodes === Node settings
Any time that you start an instance of {es}, you are starting a _node_. A Any time that you start an instance of {es}, you are starting a _node_. A
collection of connected nodes is called a <<modules-cluster,cluster>>. If you collection of connected nodes is called a <<modules-cluster,cluster>>. If you
@ -18,24 +18,33 @@ TIP: The performance of an {es} node is often limited by the performance of the
Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and
<<search-use-faster-hardware,search>>. <<search-use-faster-hardware,search>>.
[[node-name-settings]]
==== Node name setting
include::{es-ref-dir}/setup/important-settings/node-name.asciidoc[]
[[node-roles]] [[node-roles]]
==== Node roles ==== Node role settings
You define a node's roles by setting `node.roles` in `elasticsearch.yml`. If you You define a node's roles by setting `node.roles` in `elasticsearch.yml`. If you
set `node.roles`, the node is only assigned the roles you specify. If you don't set `node.roles`, the node is only assigned the roles you specify. If you don't
set `node.roles`, the node is assigned the following roles: set `node.roles`, the node is assigned the following roles:
* `master` * [[master-node]]`master`
* `data` * [[data-node]]`data`
* `data_content` * `data_content`
* `data_hot` * `data_hot`
* `data_warm` * `data_warm`
* `data_cold` * `data_cold`
* `data_frozen` * `data_frozen`
* `ingest` * `ingest`
* `ml` * [[ml-node]]`ml`
* `remote_cluster_client` * `remote_cluster_client`
* `transform` * [[transform-node]]`transform`
The following additional roles are available:
* `voting_only`
[IMPORTANT] [IMPORTANT]
==== ====
@ -65,386 +74,7 @@ As the cluster grows and in particular if you have large {ml} jobs or
{ctransforms}, consider separating dedicated master-eligible nodes from {ctransforms}, consider separating dedicated master-eligible nodes from
dedicated data nodes, {ml} nodes, and {transform} nodes. dedicated data nodes, {ml} nodes, and {transform} nodes.
<<master-node,Master-eligible node>>:: To learn more about the available node roles, see <<node-roles-overview>>.
A node that has the `master` role, which makes it eligible to be
<<modules-discovery,elected as the _master_ node>>, which controls the cluster.
<<data-node,Data node>>::
A node that has one of several data roles. Data nodes hold data and perform data
related operations such as CRUD, search, and aggregations. A node with a generic `data` role can fill any of the specialized data node roles.
<<node-ingest-node,Ingest node>>::
A node that has the `ingest` role. Ingest nodes are able to apply an
<<ingest,ingest pipeline>> to a document in order to transform and enrich the
document before indexing. With a heavy ingest load, it makes sense to use
dedicated ingest nodes and to not include the `ingest` role from nodes that have
the `master` or `data` roles.
<<remote-node,Remote-eligible node>>::
A node that has the `remote_cluster_client` role, which makes it eligible to act
as a remote client.
<<ml-node,Machine learning node>>::
A node that has the `ml` role. If you want to use {ml-features}, there must be
at least one {ml} node in your cluster. For more information, see
<<ml-settings>> and {ml-docs}/index.html[Machine learning in the {stack}].
<<transform-node,{transform-cap} node>>::
A node that has the `transform` role. If you want to use {transforms}, there
must be at least one {transform} node in your cluster. For more information, see
<<transform-settings>> and <<transforms>>.
[NOTE]
[[coordinating-node]]
.Coordinating node
===============================================
Requests like search requests or bulk-indexing requests may involve data held
on different data nodes. A search request, for example, is executed in two
phases which are coordinated by the node which receives the client request --
the _coordinating node_.
In the _scatter_ phase, the coordinating node forwards the request to the data
nodes which hold the data. Each data node executes the request locally and
returns its results to the coordinating node. In the _gather_ phase, the
coordinating node reduces each data node's results into a single global
result set.
Every node is implicitly a coordinating node. This means that a node that has
an explicit empty list of roles via `node.roles` will only act as a coordinating
node, which cannot be disabled. As a result, such a node needs to have enough
memory and CPU in order to deal with the gather phase.
===============================================
[[master-node]]
==== Master-eligible node
The master node is responsible for lightweight cluster-wide actions such as
creating or deleting an index, tracking which nodes are part of the cluster,
and deciding which shards to allocate to which nodes. It is important for
cluster health to have a stable master node.
Any master-eligible node that is not a <<voting-only-node,voting-only node>> may
be elected to become the master node by the <<modules-discovery,master election
process>>.
IMPORTANT: Master nodes must have a `path.data` directory whose contents
persist across restarts, just like data nodes, because this is where the
cluster metadata is stored. The cluster metadata describes how to read the data
stored on the data nodes, so if it is lost then the data stored on the data
nodes cannot be read.
[[dedicated-master-node]]
===== Dedicated master-eligible node
It is important for the health of the cluster that the elected master node has
the resources it needs to fulfill its responsibilities. If the elected master
node is overloaded with other tasks then the cluster will not operate well. The
most reliable way to avoid overloading the master with other tasks is to
configure all the master-eligible nodes to be _dedicated master-eligible nodes_
which only have the `master` role, allowing them to focus on managing the
cluster. Master-eligible nodes will still also behave as
<<coordinating-node,coordinating nodes>> that route requests from clients to
the other nodes in the cluster, but you should _not_ use dedicated master nodes
for this purpose.
A small or lightly-loaded cluster may operate well if its master-eligible nodes
have other roles and responsibilities, but once your cluster comprises more
than a handful of nodes it usually makes sense to use dedicated master-eligible
nodes.
To create a dedicated master-eligible node, set:
[source,yaml]
-------------------
node.roles: [ master ]
-------------------
[[voting-only-node]]
===== Voting-only master-eligible node
A voting-only master-eligible node is a node that participates in
<<modules-discovery,master elections>> but which will not act as the cluster's
elected master node. In particular, a voting-only node can serve as a tiebreaker
in elections.
It may seem confusing to use the term "master-eligible" to describe a
voting-only node since such a node is not actually eligible to become the master
at all. This terminology is an unfortunate consequence of history:
master-eligible nodes are those nodes that participate in elections and perform
certain tasks during cluster state publications, and voting-only nodes have the
same responsibilities even if they can never become the elected master.
To configure a master-eligible node as a voting-only node, include `master` and
`voting_only` in the list of roles. For example to create a voting-only data
node:
[source,yaml]
-------------------
node.roles: [ data, master, voting_only ]
-------------------
IMPORTANT: Only nodes with the `master` role can be marked as having the
`voting_only` role.
High availability (HA) clusters require at least three master-eligible nodes, at
least two of which are not voting-only nodes. Such a cluster will be able to
elect a master node even if one of the nodes fails.
Voting-only master-eligible nodes may also fill other roles in your cluster.
For instance, a node may be both a data node and a voting-only master-eligible
node. A _dedicated_ voting-only master-eligible nodes is a voting-only
master-eligible node that fills no other roles in the cluster. To create a
dedicated voting-only master-eligible node, set:
[source,yaml]
-------------------
node.roles: [ master, voting_only ]
-------------------
Since dedicated voting-only nodes never act as the cluster's elected master,
they may require less heap and a less powerful CPU than the true master nodes.
However all master-eligible nodes, including voting-only nodes, are on the
critical path for <<cluster-state-publishing,publishing cluster state
updates>>. Cluster state updates are usually independent of
performance-critical workloads such as indexing or searches, but they are
involved in management activities such as index creation and rollover, mapping
updates, and recovery after a failure. The performance characteristics of these
activities are a function of the speed of the storage on each master-eligible
node, as well as the reliability and latency of the network interconnections
between the elected master node and the other nodes in the cluster. You must
therefore ensure that the storage and networking available to the nodes in your
cluster are good enough to meet your performance goals.
[[data-node]]
==== Data nodes
Data nodes hold the shards that contain the documents you have indexed. Data
nodes handle data related operations like CRUD, search, and aggregations.
These operations are I/O-, memory-, and CPU-intensive. It is important to
monitor these resources and to add more data nodes if they are overloaded.
The main benefit of having dedicated data nodes is the separation of the master
and data roles.
In a multi-tier deployment architecture, you use specialized data roles to
assign data nodes to specific tiers: `data_content`,`data_hot`, `data_warm`,
`data_cold`, or `data_frozen`. A node can belong to multiple tiers.
If you want to include a node in all tiers, or if your cluster does not use multiple tiers, then you can use the generic `data` role.
include::../how-to/shard-limits.asciidoc[]
WARNING: If you assign a node to a specific tier using a specialized data role, then you shouldn't also assign it the generic `data` role. The generic `data` role takes precedence over specialized data roles.
[[generic-data-node]]
===== Generic data node
Generic data nodes are included in all content tiers.
To create a dedicated generic data node, set:
[source,yaml]
----
node.roles: [ data ]
----
[[data-content-node]]
===== Content data node
Content data nodes are part of the content tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=content-tier]
To create a dedicated content node, set:
[source,yaml]
----
node.roles: [ data_content ]
----
[[data-hot-node]]
===== Hot data node
Hot data nodes are part of the hot tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=hot-tier]
To create a dedicated hot node, set:
[source,yaml]
----
node.roles: [ data_hot ]
----
[[data-warm-node]]
===== Warm data node
Warm data nodes are part of the warm tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=warm-tier]
To create a dedicated warm node, set:
[source,yaml]
----
node.roles: [ data_warm ]
----
[[data-cold-node]]
===== Cold data node
Cold data nodes are part of the cold tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=cold-tier]
To create a dedicated cold node, set:
[source,yaml]
----
node.roles: [ data_cold ]
----
[[data-frozen-node]]
===== Frozen data node
Frozen data nodes are part of the frozen tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=frozen-tier]
To create a dedicated frozen node, set:
[source,yaml]
----
node.roles: [ data_frozen ]
----
[[node-ingest-node]]
==== Ingest node
Ingest nodes can execute pre-processing pipelines, composed of one or more
ingest processors. Depending on the type of operations performed by the ingest
processors and the required resources, it may make sense to have dedicated
ingest nodes, that will only perform this specific task.
To create a dedicated ingest node, set:
[source,yaml]
----
node.roles: [ ingest ]
----
[[coordinating-only-node]]
==== Coordinating only node
If you take away the ability to be able to handle master duties, to hold data,
and pre-process documents, then you are left with a _coordinating_ node that
can only route requests, handle the search reduce phase, and distribute bulk
indexing. Essentially, coordinating only nodes behave as smart load balancers.
Coordinating only nodes can benefit large clusters by offloading the
coordinating node role from data and master-eligible nodes. They join the
cluster and receive the full <<cluster-state,cluster state>>, like every other
node, and they use the cluster state to route requests directly to the
appropriate place(s).
WARNING: Adding too many coordinating only nodes to a cluster can increase the
burden on the entire cluster because the elected master node must await
acknowledgement of cluster state updates from every node! The benefit of
coordinating only nodes should not be overstated -- data nodes can happily
serve the same purpose.
To create a dedicated coordinating node, set:
[source,yaml]
----
node.roles: [ ]
----
[[remote-node]]
==== Remote-eligible node
A remote-eligible node acts as a cross-cluster client and connects to
<<remote-clusters,remote clusters>>. Once connected, you can search
remote clusters using <<modules-cross-cluster-search,{ccs}>>. You can also sync
data between clusters using <<xpack-ccr,{ccr}>>.
[source,yaml]
----
node.roles: [ remote_cluster_client ]
----
[[ml-node]]
==== [xpack]#Machine learning node#
{ml-cap} nodes run jobs and handle {ml} API requests. For more information, see
<<ml-settings>>.
To create a dedicated {ml} node, set:
[source,yaml]
----
node.roles: [ ml, remote_cluster_client]
----
The `remote_cluster_client` role is optional but strongly recommended.
Otherwise, {ccs} fails when used in {ml} jobs or {dfeeds}. If you use {ccs} in
your {anomaly-jobs}, the `remote_cluster_client` role is also required on all
master-eligible nodes. Otherwise, the {dfeed} cannot start. See <<remote-node>>.
[[transform-node]]
==== [xpack]#{transform-cap} node#
{transform-cap} nodes run {transforms} and handle {transform} API requests. For
more information, see <<transform-settings>>.
To create a dedicated {transform} node, set:
[source,yaml]
----
node.roles: [ transform, remote_cluster_client ]
----
The `remote_cluster_client` role is optional but strongly recommended.
Otherwise, {ccs} fails when used in {transforms}. See <<remote-node>>.
[[change-node-role]]
==== Changing the role of a node
Each data node maintains the following data on disk:
* the shard data for every shard allocated to that node,
* the index metadata corresponding with every shard allocated to that node, and
* the cluster-wide metadata, such as settings and index templates.
Similarly, each master-eligible node maintains the following data on disk:
* the index metadata for every index in the cluster, and
* the cluster-wide metadata, such as settings and index templates.
Each node checks the contents of its data path at startup. If it discovers
unexpected data then it will refuse to start. This is to avoid importing
unwanted <<dangling-indices,dangling indices>> which can lead
to a red cluster health. To be more precise, nodes without the `data` role will
refuse to start if they find any shard data on disk at startup, and nodes
without both the `master` and `data` roles will refuse to start if they have any
index metadata on disk at startup.
It is possible to change the roles of a node by adjusting its
`elasticsearch.yml` file and restarting it. This is known as _repurposing_ a
node. In order to satisfy the checks for unexpected data described above, you
must perform some extra steps to prepare a node for repurposing when starting
the node without the `data` or `master` roles.
* If you want to repurpose a data node by removing the `data` role then you
should first use an <<cluster-shard-allocation-filtering,allocation filter>> to safely
migrate all the shard data onto other nodes in the cluster.
* If you want to repurpose a node to have neither the `data` nor `master` roles
then it is simplest to start a brand-new node with an empty data path and the
desired roles. You may find it safest to use an
<<cluster-shard-allocation-filtering,allocation filter>> to migrate the shard data elsewhere
in the cluster first.
If it is not possible to follow these extra steps then you may be able to use
the <<node-tool-repurpose,`elasticsearch-node repurpose`>> tool to delete any
excess data that prevents a node from starting.
[discrete] [discrete]
=== Node data path settings === Node data path settings
@ -495,6 +125,25 @@ modify the contents of the data directory. The data directory contains no
executables so a virus scan will only find false positives. executables so a virus scan will only find false positives.
// end::modules-node-data-path-warning-tag[] // end::modules-node-data-path-warning-tag[]
[[custom-node-attributes]]
==== Custom node attributes
If needed, you can add custom attributes to a node. These attributes can be used to <<cluster-routing-settings,filter which nodes a shard can be allocated to>>, or to group nodes together for <<shard-allocation-awareness,shard allocation awareness>>.
[TIP]
===============================================
You can also set a node attribute using the `-E` command line argument when you start a node:
[source,sh]
--------------------------------------------------------
./bin/elasticsearch -Enode.attr.rack_id=rack_one
--------------------------------------------------------
===============================================
`node.attr.<attribute-name>`::
(<<dynamic-cluster-setting,Dynamic>>)
A custom attribute that you can assign to a node. For example, you might assign a `rack_id` attribute to each node to ensure that primary and replica shards are not allocated on the same rack. You can specify multiple attributes as a comma-separated list.
[discrete] [discrete]
[[other-node-settings]] [[other-node-settings]]
=== Other node settings === Other node settings

View file

@ -80,7 +80,7 @@ The _gateway nodes_ selection depends on the following criteria:
+ +
* *version*: Remote nodes must be compatible with the cluster they are * *version*: Remote nodes must be compatible with the cluster they are
registered to. registered to.
* *role*: By default, any non-<<master-node,master-eligible>> node can act as a * *role*: By default, any non-<<master-node-role,master-eligible>> node can act as a
gateway node. Dedicated master nodes are never selected as gateway nodes. gateway node. Dedicated master nodes are never selected as gateway nodes.
* *attributes*: You can define the gateway nodes for a cluster by setting * *attributes*: You can define the gateway nodes for a cluster by setting
<<cluster-remote-node-attr,`cluster.remote.node.attr.gateway`>> to `true`. <<cluster-remote-node-attr,`cluster.remote.node.attr.gateway`>> to `true`.

View file

@ -25,7 +25,7 @@ By default, the primary and replica shard copies for an index can be allocated t
You can control how shard copies are allocated using the following settings: You can control how shard copies are allocated using the following settings:
- <<modules-cluster,Cluster-level shard allocation settings>>: Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to allocate nodes availability zones, or prevent certain nodes from being used so you can perform maintenance. - <<modules-cluster,Cluster-level shard allocation settings>>: Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to <<shard-allocation-awareness,allocate nodes availability zones>>, or prevent certain nodes from being used so you can perform maintenance.
- <<index-modules-allocation,Index-level shard allocation settings>>: Use these settings to control how the shard copies for a specific index are allocated. For example, you might want to allocate an index to a node in a specific data tier, or to an node with specific attributes. - <<index-modules-allocation,Index-level shard allocation settings>>: Use these settings to control how the shard copies for a specific index are allocated. For example, you might want to allocate an index to a node in a specific data tier, or to an node with specific attributes.

View file

@ -9,6 +9,7 @@ performance of your {es} cluster.
* <<monitoring-overview>> * <<monitoring-overview>>
* <<how-monitoring-works>> * <<how-monitoring-works>>
* <<logging>>
* <<monitoring-production>> * <<monitoring-production>>
* <<configuring-elastic-agent>> * <<configuring-elastic-agent>>
* <<configuring-metricbeat>> * <<configuring-metricbeat>>
@ -23,6 +24,8 @@ include::overview.asciidoc[]
include::how-monitoring-works.asciidoc[] include::how-monitoring-works.asciidoc[]
include::{es-ref-dir}/setup/logging-config.asciidoc[]
include::production.asciidoc[] include::production.asciidoc[]
include::configuring-elastic-agent.asciidoc[] include::configuring-elastic-agent.asciidoc[]

View file

@ -0,0 +1,437 @@
[[node-roles-overview]]
== Node roles
Any time that you start an instance of {es}, you are starting a _node_. A
collection of connected nodes is called a <<modules-cluster,cluster>>. If you
are running a single node of {es}, then you have a cluster of one node. All nodes know about all the other nodes in the cluster and can forward client
requests to the appropriate node.
Each node performs one or more roles. Roles control the behavior of the node in the cluster.
[discrete]
[[set-node-roles]]
=== Set node roles
You define a node's roles by setting `node.roles` in <<settings,`elasticsearch.yml`>>. If you set `node.roles`, the node is only assigned the roles you specify. If you don't set `node.roles`, the node is assigned the following roles:
* `master`
* `data`
* `data_content`
* `data_hot`
* `data_warm`
* `data_cold`
* `data_frozen`
* `ingest`
* `ml`
* `remote_cluster_client`
* `transform`
[IMPORTANT]
====
If you set `node.roles`, ensure you specify every node role your cluster needs.
Every cluster requires the following node roles:
* `master`
* {blank}
+
--
`data_content` and `data_hot` +
OR +
`data`
--
Some {stack} features also require specific node roles:
- {ccs-cap} and {ccr} require the `remote_cluster_client` role.
- {stack-monitor-app} and ingest pipelines require the `ingest` role.
- {fleet}, the {security-app}, and {transforms} require the `transform` role.
The `remote_cluster_client` role is also required to use {ccs} with these
features.
- {ml-cap} features, such as {anomaly-detect}, require the `ml` role.
====
As the cluster grows and in particular if you have large {ml} jobs or
{ctransforms}, consider separating dedicated master-eligible nodes from
dedicated data nodes, {ml} nodes, and {transform} nodes.
[discrete]
[[change-node-role]]
=== Change the role of a node
Each data node maintains the following data on disk:
* the shard data for every shard allocated to that node,
* the index metadata corresponding with every shard allocated to that node, and
* the cluster-wide metadata, such as settings and index templates.
Similarly, each master-eligible node maintains the following data on disk:
* the index metadata for every index in the cluster, and
* the cluster-wide metadata, such as settings and index templates.
Each node checks the contents of its data path at startup. If it discovers
unexpected data then it will refuse to start. This is to avoid importing
unwanted <<dangling-indices,dangling indices>> which can lead
to a red cluster health. To be more precise, nodes without the `data` role will
refuse to start if they find any shard data on disk at startup, and nodes
without both the `master` and `data` roles will refuse to start if they have any
index metadata on disk at startup.
It is possible to change the roles of a node by adjusting its
`elasticsearch.yml` file and restarting it. This is known as _repurposing_ a
node. In order to satisfy the checks for unexpected data described above, you
must perform some extra steps to prepare a node for repurposing when starting
the node without the `data` or `master` roles.
* If you want to repurpose a data node by removing the `data` role then you
should first use an <<cluster-shard-allocation-filtering,allocation filter>> to safely
migrate all the shard data onto other nodes in the cluster.
* If you want to repurpose a node to have neither the `data` nor `master` roles
then it is simplest to start a brand-new node with an empty data path and the
desired roles. You may find it safest to use an
<<cluster-shard-allocation-filtering,allocation filter>> to migrate the shard data elsewhere
in the cluster first.
If it is not possible to follow these extra steps then you may be able to use
the <<node-tool-repurpose,`elasticsearch-node repurpose`>> tool to delete any
excess data that prevents a node from starting.
[discrete]
[[node-roles-list]]
=== Available node roles
The following is a list of the roles that a node can perform in a cluster. A node can have one or more roles.
* <<master-node-role,Master-eligible node>> (`master`): A node that is eligible to be
<<modules-discovery,elected as the _master_ node>>, which controls the cluster.
* <<data-node-role,Data node>> (`data`, `data_content`, `data_hot`, `data_warm`, `data_cold`, `data_frozen`): A node that has one of several data roles. Data nodes hold data and perform data related operations such as CRUD, search, and aggregations. You might use multiple data roles in a cluster so you can implement <<data-tiers,data tiers>>.
* <<node-ingest-node,Ingest node>> (`ingest`): Ingest nodes are able to apply an <<ingest,ingest pipeline>> to a document in order to transform and enrich the document before indexing. With a heavy ingest load, it makes sense to use dedicated ingest nodes and to not include the `ingest` role from nodes that have the `master` or `data` roles.
* <<remote-node,Remote-eligible node>> (`remote_cluster_client`): A node that is eligible to act as a remote client.
* <<ml-node-role,Machine learning node>> (`ml`): A node that can run {ml-features}. If you want to use {ml-features}, there must be at least one {ml} node in your cluster. For more information, see <<ml-settings>> and {ml-docs}/index.html[Machine learning in the {stack}].
* <<transform-node-role,{transform-cap} node>> (`transform`): A node that can perform {transforms}. If you want to use {transforms}, there must be at least one {transform} node in your cluster. For more information, see <<transform-settings>> and <<transforms>>.
[NOTE]
[[coordinating-node]]
.Coordinating node
===============================================
Requests like search requests or bulk-indexing requests may involve data held
on different data nodes. A search request, for example, is executed in two
phases which are coordinated by the node which receives the client request --
the _coordinating node_.
In the _scatter_ phase, the coordinating node forwards the request to the data
nodes which hold the data. Each data node executes the request locally and
returns its results to the coordinating node. In the _gather_ phase, the
coordinating node reduces each data node's results into a single global
result set.
Every node is implicitly a coordinating node. This means that a node that has
an explicit empty list of roles in the `node.roles` setting will only act as a coordinating
node, which cannot be disabled. As a result, such a node needs to have enough
memory and CPU in order to deal with the gather phase.
===============================================
[discrete]
[[master-node-role]]
==== Master-eligible node
The master node is responsible for lightweight cluster-wide actions such as
creating or deleting an index, tracking which nodes are part of the cluster,
and deciding which shards to allocate to which nodes. It is important for
cluster health to have a stable master node.
Any master-eligible node that is not a <<voting-only-node,voting-only node>> may
be elected to become the master node by the <<modules-discovery,master election
process>>.
IMPORTANT: Master nodes must have a `path.data` directory whose contents
persist across restarts, just like data nodes, because this is where the
cluster metadata is stored. The cluster metadata describes how to read the data
stored on the data nodes, so if it is lost then the data stored on the data
nodes cannot be read.
[discrete]
[[dedicated-master-node]]
===== Dedicated master-eligible node
It is important for the health of the cluster that the elected master node has
the resources it needs to fulfill its responsibilities. If the elected master
node is overloaded with other tasks then the cluster will not operate well. The
most reliable way to avoid overloading the master with other tasks is to
configure all the master-eligible nodes to be _dedicated master-eligible nodes_
which only have the `master` role, allowing them to focus on managing the
cluster. Master-eligible nodes will still also behave as
<<coordinating-node,coordinating nodes>> that route requests from clients to
the other nodes in the cluster, but you should _not_ use dedicated master nodes
for this purpose.
A small or lightly-loaded cluster may operate well if its master-eligible nodes
have other roles and responsibilities, but once your cluster comprises more
than a handful of nodes it usually makes sense to use dedicated master-eligible
nodes.
To create a dedicated master-eligible node, set:
[source,yaml]
-------------------
node.roles: [ master ]
-------------------
[discrete]
[[voting-only-node]]
===== Voting-only master-eligible node
A voting-only master-eligible node is a node that participates in
<<modules-discovery,master elections>> but which will not act as the cluster's
elected master node. In particular, a voting-only node can serve as a tiebreaker
in elections.
It may seem confusing to use the term "master-eligible" to describe a
voting-only node since such a node is not actually eligible to become the master
at all. This terminology is an unfortunate consequence of history:
master-eligible nodes are those nodes that participate in elections and perform
certain tasks during cluster state publications, and voting-only nodes have the
same responsibilities even if they can never become the elected master.
To configure a master-eligible node as a voting-only node, include `master` and
`voting_only` in the list of roles. For example to create a voting-only data
node:
[source,yaml]
-------------------
node.roles: [ data, master, voting_only ]
-------------------
IMPORTANT: Only nodes with the `master` role can be marked as having the
`voting_only` role.
High availability (HA) clusters require at least three master-eligible nodes, at
least two of which are not voting-only nodes. Such a cluster will be able to
elect a master node even if one of the nodes fails.
Voting-only master-eligible nodes may also fill other roles in your cluster.
For instance, a node may be both a data node and a voting-only master-eligible
node. A _dedicated_ voting-only master-eligible nodes is a voting-only
master-eligible node that fills no other roles in the cluster. To create a
dedicated voting-only master-eligible node, set:
[source,yaml]
-------------------
node.roles: [ master, voting_only ]
-------------------
Since dedicated voting-only nodes never act as the cluster's elected master,
they may require less heap and a less powerful CPU than the true master nodes.
However all master-eligible nodes, including voting-only nodes, are on the
critical path for <<cluster-state-publishing,publishing cluster state
updates>>. Cluster state updates are usually independent of
performance-critical workloads such as indexing or searches, but they are
involved in management activities such as index creation and rollover, mapping
updates, and recovery after a failure. The performance characteristics of these
activities are a function of the speed of the storage on each master-eligible
node, as well as the reliability and latency of the network interconnections
between the elected master node and the other nodes in the cluster. You must
therefore ensure that the storage and networking available to the nodes in your
cluster are good enough to meet your performance goals.
[discrete]
[[data-node-role]]
==== Data nodes
Data nodes hold the shards that contain the documents you have indexed. Data
nodes handle data related operations like CRUD, search, and aggregations.
These operations are I/O-, memory-, and CPU-intensive. It is important to
monitor these resources and to add more data nodes if they are overloaded.
The main benefit of having dedicated data nodes is the separation of the master
and data roles.
In a multi-tier deployment architecture, you use specialized data roles to
assign data nodes to specific tiers: `data_content`,`data_hot`, `data_warm`,
`data_cold`, or `data_frozen`. A node can belong to multiple tiers.
If you want to include a node in all tiers, or if your cluster does not use multiple tiers, then you can use the generic `data` role.
include::{es-ref-dir}/how-to/shard-limits.asciidoc[]
WARNING: If you assign a node to a specific tier using a specialized data role, then you shouldn't also assign it the generic `data` role. The generic `data` role takes precedence over specialized data roles.
[discrete]
[[generic-data-node]]
===== Generic data node
Generic data nodes are included in all content tiers. A node with a generic `data` role can fill any of the specialized data node roles.
To create a dedicated generic data node, set:
[source,yaml]
----
node.roles: [ data ]
----
[discrete]
[[data-content-node]]
===== Content data node
Content data nodes are part of the content tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=content-tier]
To create a dedicated content node, set:
[source,yaml]
----
node.roles: [ data_content ]
----
[discrete]
[[data-hot-node]]
===== Hot data node
Hot data nodes are part of the hot tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=hot-tier]
To create a dedicated hot node, set:
[source,yaml]
----
node.roles: [ data_hot ]
----
[discrete]
[[data-warm-node]]
===== Warm data node
Warm data nodes are part of the warm tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=warm-tier]
To create a dedicated warm node, set:
[source,yaml]
----
node.roles: [ data_warm ]
----
[discrete]
[[data-cold-node]]
===== Cold data node
Cold data nodes are part of the cold tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=cold-tier]
To create a dedicated cold node, set:
[source,yaml]
----
node.roles: [ data_cold ]
----
[discrete]
[[data-frozen-node]]
===== Frozen data node
Frozen data nodes are part of the frozen tier.
include::{es-ref-dir}/datatiers.asciidoc[tag=frozen-tier]
To create a dedicated frozen node, set:
[source,yaml]
----
node.roles: [ data_frozen ]
----
[discrete]
[[node-ingest-node]]
==== Ingest node
Ingest nodes can execute pre-processing pipelines, composed of one or more
ingest processors. Depending on the type of operations performed by the ingest
processors and the required resources, it may make sense to have dedicated
ingest nodes, that will only perform this specific task.
To create a dedicated ingest node, set:
[source,yaml]
----
node.roles: [ ingest ]
----
[discrete]
[[coordinating-only-node]]
==== Coordinating only node
If you take away the ability to be able to handle master duties, to hold data,
and pre-process documents, then you are left with a _coordinating_ node that
can only route requests, handle the search reduce phase, and distribute bulk
indexing. Essentially, coordinating only nodes behave as smart load balancers.
Coordinating only nodes can benefit large clusters by offloading the
coordinating node role from data and master-eligible nodes. They join the
cluster and receive the full <<cluster-state,cluster state>>, like every other
node, and they use the cluster state to route requests directly to the
appropriate place(s).
WARNING: Adding too many coordinating only nodes to a cluster can increase the
burden on the entire cluster because the elected master node must await
acknowledgement of cluster state updates from every node! The benefit of
coordinating only nodes should not be overstated -- data nodes can happily
serve the same purpose.
To create a dedicated coordinating node, set:
[source,yaml]
----
node.roles: [ ]
----
[discrete]
[[remote-node]]
==== Remote-eligible node
A remote-eligible node acts as a cross-cluster client and connects to
<<remote-clusters,remote clusters>>. Once connected, you can search
remote clusters using <<modules-cross-cluster-search,{ccs}>>. You can also sync
data between clusters using <<xpack-ccr,{ccr}>>.
[source,yaml]
----
node.roles: [ remote_cluster_client ]
----
[discrete]
[[ml-node-role]]
==== [xpack]#Machine learning node#
{ml-cap} nodes run jobs and handle {ml} API requests. For more information, see
<<ml-settings>>.
To create a dedicated {ml} node, set:
[source,yaml]
----
node.roles: [ ml, remote_cluster_client]
----
The `remote_cluster_client` role is optional but strongly recommended.
Otherwise, {ccs} fails when used in {ml} jobs or {dfeeds}. If you use {ccs} in
your {anomaly-jobs}, the `remote_cluster_client` role is also required on all
master-eligible nodes. Otherwise, the {dfeed} cannot start. See <<remote-node>>.
[discrete]
[[transform-node-role]]
==== [xpack]#{transform-cap} node#
{transform-cap} nodes run {transforms} and handle {transform} API requests. For
more information, see <<transform-settings>>.
To create a dedicated {transform} node, set:
[source,yaml]
----
node.roles: [ transform, remote_cluster_client ]
----
The `remote_cluster_client` role is optional but strongly recommended.
Otherwise, {ccs} fails when used in {transforms}. See <<remote-node>>.

View file

@ -0,0 +1,112 @@
[[path-settings-overview]]
=== Path settings
include::{es-ref-dir}/setup/important-settings/path-settings.asciidoc[]
[[multiple-data-paths]]
==== Multiple data paths
deprecated::[7.13.0]
If needed, you can specify multiple paths in `path.data`. {es} stores the node's
data across all provided paths but keeps each shard's data on the same path.
{es} does not balance shards across a node's data paths. High disk
usage in a single path can trigger a <<disk-based-shard-allocation,high disk
usage watermark>> for the entire node. If triggered, {es} will not add shards to
the node, even if the nodes other paths have available disk space. If you need
additional disk space, we recommend you add a new node rather than additional
data paths.
include::{es-ref-dir}/tab-widgets/multi-data-path-widget.asciidoc[]
[[mdp-migrate]]
===== Migrate from multiple data paths
Support for multiple data paths was deprecated in 7.13 and will be removed
in a future release.
As an alternative to multiple data paths, you can create a filesystem which
spans multiple disks with a hardware virtualisation layer such as RAID, or a
software virtualisation layer such as Logical Volume Manager (LVM) on Linux or
Storage Spaces on Windows. If you wish to use multiple data paths on a single
machine then you must run one node for each data path.
If you currently use multiple data paths in a
{ref}/high-availability-cluster-design.html[highly available cluster] then you
can migrate to a setup that uses a single path for each node without downtime
using a process similar to a
{ref}/restart-cluster.html#restart-cluster-rolling[rolling restart]: shut each
node down in turn and replace it with one or more nodes each configured to use
a single data path. In more detail, for each node that currently has multiple
data paths you should follow the following process. In principle you can
perform this migration during a rolling upgrade to 8.0, but we recommend
migrating to a single-data-path setup before starting to upgrade.
1. Take a snapshot to protect your data in case of disaster.
2. Optionally, migrate the data away from the target node by using an
{ref}/modules-cluster.html#cluster-shard-allocation-filtering[allocation filter]:
+
[source,console]
--------------------------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._name": "target-node-name"
}
}
--------------------------------------------------
+
You can use the {ref}/cat-allocation.html[cat allocation API] to track progress
of this data migration. If some shards do not migrate then the
{ref}/cluster-allocation-explain.html[cluster allocation explain API] will help
you to determine why.
3. Follow the steps in the
{ref}/restart-cluster.html#restart-cluster-rolling[rolling restart process]
up to and including shutting the target node down.
4. Ensure your cluster health is `yellow` or `green`, so that there is a copy
of every shard assigned to at least one of the other nodes in your cluster.
5. If applicable, remove the allocation filter applied in the earlier step.
+
[source,console]
--------------------------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._name": null
}
}
--------------------------------------------------
6. Discard the data held by the stopped node by deleting the contents of its
data paths.
7. Reconfigure your storage. For instance, combine your disks into a single
filesystem using LVM or Storage Spaces. Ensure that your reconfigured storage
has sufficient space for the data that it will hold.
8. Reconfigure your node by adjusting the `path.data` setting in its
`elasticsearch.yml` file. If needed, install more nodes each with their own
`path.data` setting pointing at a separate data path.
9. Start the new nodes and follow the rest of the
{ref}/restart-cluster.html#restart-cluster-rolling[rolling restart process] for
them.
10. Ensure your cluster health is `green`, so that every shard has been
assigned.
You can alternatively add some number of single-data-path nodes to your
cluster, migrate all your data over to these new nodes using
{ref}/modules-cluster.html#cluster-shard-allocation-filtering[allocation filters],
and then remove the old nodes from the cluster. This approach will temporarily
double the size of your cluster so it will only work if you have the capacity to
expand your cluster like this.
If you currently use multiple data paths but your cluster is not highly
available then you can migrate to a non-deprecated configuration by taking
a snapshot, creating a new cluster with the desired configuration and restoring
the snapshot into it.

View file

@ -85,7 +85,7 @@ cross-cluster search requests. Defaults to `true`.
`max_concurrent_searches`:: `max_concurrent_searches`::
(Optional, integer) Maximum number of concurrent searches the API can run. (Optional, integer) Maximum number of concurrent searches the API can run.
Defaults to +max(1, (# of <<data-node,data nodes>> * Defaults to +max(1, (# of <<data-node-role,data nodes>> *
min(<<search-threadpool,search thread pool size>>, 10)))+. min(<<search-threadpool,search thread pool size>>, 10)))+.
`rest_total_hits_as_int`:: `rest_total_hits_as_int`::

View file

@ -97,7 +97,7 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=index-ignore-unavailabl
`max_concurrent_searches`:: `max_concurrent_searches`::
(Optional, integer) (Optional, integer)
Maximum number of concurrent searches the multi search API can execute. Defaults Maximum number of concurrent searches the multi search API can execute. Defaults
to +max(1, (# of <<data-node,data nodes>> * min(<<search-threadpool,search thread pool size>>, 10)))+. to +max(1, (# of <<data-node-role,data nodes>> * min(<<search-threadpool,search thread pool size>>, 10)))+.
`max_concurrent_shard_requests`:: `max_concurrent_shard_requests`::
+ +

View file

@ -19,6 +19,8 @@ at all.
// end::ml-settings-description-tag[] // end::ml-settings-description-tag[]
TIP: To control memory usage used by {ml} jobs, you can use the <<circuit-breakers-page-model-inference, machine learning circuit breaker settings>>.
[discrete] [discrete]
[[general-ml-settings]] [[general-ml-settings]]
==== General machine learning settings ==== General machine learning settings
@ -67,7 +69,7 @@ limitations as described <<ml-settings-description,above>>.
The inference cache exists in the JVM heap on each ingest node. The cache The inference cache exists in the JVM heap on each ingest node. The cache
affords faster processing times for the `inference` processor. The value can be affords faster processing times for the `inference` processor. The value can be
a static byte sized value (such as `2gb`) or a percentage of total allocated a static byte sized value (such as `2gb`) or a percentage of total allocated
heap. Defaults to `40%`. See also <<model-inference-circuit-breaker>>. heap. Defaults to `40%`. See also <<circuit-breakers-page-model-inference>>.
[[xpack-interference-model-ttl]] [[xpack-interference-model-ttl]]
// tag::interference-model-ttl-tag[] // tag::interference-model-ttl-tag[]
@ -250,10 +252,3 @@ nodes in your cluster, you shouldn't use this setting.
If this setting is `true` it also affects the default value for If this setting is `true` it also affects the default value for
`xpack.ml.max_model_memory_limit`. In this case `xpack.ml.max_model_memory_limit` `xpack.ml.max_model_memory_limit`. In this case `xpack.ml.max_model_memory_limit`
defaults to the largest size that could be assigned in the current cluster. defaults to the largest size that could be assigned in the current cluster.
[discrete]
[[model-inference-circuit-breaker]]
==== {ml-cap} circuit breaker settings
The relevant circuit breaker settings can be found in the <<circuit-breakers-page-model-inference, Circuit Breakers page>>.

View file

@ -27,6 +27,8 @@ the only resource-intensive application on the host or container. For example,
you might run {metricbeat} alongside {es} for cluster statistics, but a you might run {metricbeat} alongside {es} for cluster statistics, but a
resource-heavy {ls} deployment should be on its own host. resource-heavy {ls} deployment should be on its own host.
// alphabetized
include::run-elasticsearch-locally.asciidoc[] include::run-elasticsearch-locally.asciidoc[]
include::setup/install.asciidoc[] include::setup/install.asciidoc[]
@ -47,30 +49,28 @@ include::settings/ccr-settings.asciidoc[]
include::modules/discovery/discovery-settings.asciidoc[] include::modules/discovery/discovery-settings.asciidoc[]
include::settings/data-stream-lifecycle-settings.asciidoc[]
include::modules/indices/fielddata.asciidoc[] include::modules/indices/fielddata.asciidoc[]
include::modules/gateway.asciidoc[]
include::settings/health-diagnostic-settings.asciidoc[] include::settings/health-diagnostic-settings.asciidoc[]
include::settings/ilm-settings.asciidoc[] include::settings/ilm-settings.asciidoc[]
include::settings/data-stream-lifecycle-settings.asciidoc[]
include::modules/indices/index_management.asciidoc[] include::modules/indices/index_management.asciidoc[]
include::modules/indices/recovery.asciidoc[] include::modules/indices/recovery.asciidoc[]
include::modules/indices/indexing_buffer.asciidoc[] include::modules/indices/indexing_buffer.asciidoc[]
include::settings/inference-settings.asciidoc[]
include::settings/license-settings.asciidoc[] include::settings/license-settings.asciidoc[]
include::modules/gateway.asciidoc[]
include::setup/logging-config.asciidoc[]
include::settings/ml-settings.asciidoc[] include::settings/ml-settings.asciidoc[]
include::settings/inference-settings.asciidoc[]
include::settings/monitoring-settings.asciidoc[] include::settings/monitoring-settings.asciidoc[]
include::modules/node.asciidoc[] include::modules/node.asciidoc[]
@ -79,6 +79,8 @@ include::modules/network.asciidoc[]
include::modules/indices/query_cache.asciidoc[] include::modules/indices/query_cache.asciidoc[]
include::{es-ref-dir}/path-settings-overview.asciidoc[]
include::modules/indices/search-settings.asciidoc[] include::modules/indices/search-settings.asciidoc[]
include::settings/security-settings.asciidoc[] include::settings/security-settings.asciidoc[]

View file

@ -48,7 +48,7 @@ For more information about discovery and shard allocation, refer to
As nodes are added or removed Elasticsearch maintains an optimal level of fault As nodes are added or removed Elasticsearch maintains an optimal level of fault
tolerance by automatically updating the cluster's _voting configuration_, which tolerance by automatically updating the cluster's _voting configuration_, which
is the set of <<master-node,master-eligible nodes>> whose responses are counted is the set of <<master-node-role,master-eligible nodes>> whose responses are counted
when making decisions such as electing a new master or committing a new cluster when making decisions such as electing a new master or committing a new cluster
state. state.

View file

@ -1,13 +1,7 @@
[[advanced-configuration]] [[advanced-configuration]]
=== Advanced configuration === Set JVM options
Modifying advanced settings is generally not recommended and could negatively
impact performance and stability. Using the {es}-provided defaults
is recommended in most circumstances.
[[set-jvm-options]] [[set-jvm-options]]
==== Set JVM options
If needed, you can override the default JVM options by adding custom options If needed, you can override the default JVM options by adding custom options
files (preferred) or setting the `ES_JAVA_OPTS` environment variable. files (preferred) or setting the `ES_JAVA_OPTS` environment variable.
@ -21,10 +15,14 @@ Where you put the JVM options files depends on the type of installation:
* Docker: Bind mount custom JVM options files into * Docker: Bind mount custom JVM options files into
`/usr/share/elasticsearch/config/jvm.options.d/`. `/usr/share/elasticsearch/config/jvm.options.d/`.
CAUTION: Setting your own JVM options is generally not recommended and could negatively
impact performance and stability. Using the {es}-provided defaults
is recommended in most circumstances.
NOTE: Do not modify the root `jvm.options` file. Use files in `jvm.options.d/` instead. NOTE: Do not modify the root `jvm.options` file. Use files in `jvm.options.d/` instead.
[[jvm-options-syntax]] [[jvm-options-syntax]]
===== JVM options syntax ==== JVM options syntax
A JVM options file contains a line-delimited list of JVM arguments. A JVM options file contains a line-delimited list of JVM arguments.
Arguments are preceded by a dash (`-`). Arguments are preceded by a dash (`-`).
@ -66,7 +64,7 @@ and ignored. Lines that aren't commented out and aren't recognized
as valid JVM arguments are rejected and {es} will fail to start. as valid JVM arguments are rejected and {es} will fail to start.
[[jvm-options-env]] [[jvm-options-env]]
===== Use environment variables to set JVM options ==== Use environment variables to set JVM options
In production, use JVM options files to override the In production, use JVM options files to override the
default settings. In testing and development environments, default settings. In testing and development environments,
@ -155,23 +153,11 @@ options. We do not recommend using `ES_JAVA_OPTS` in production.
NOTE: If you are running {es} as a Windows service, you can change the heap size NOTE: If you are running {es} as a Windows service, you can change the heap size
using the service manager. See <<windows-service>>. using the service manager. See <<windows-service>>.
[[readiness-tcp-port]] [[heap-dump-path]]
===== Enable the Elasticsearch TCP readiness port include::important-settings/heap-dump-path.asciidoc[leveloffset=-1]
preview::[]
If configured, a node can open a TCP port when the node is in a ready state. A node is deemed
ready when it has successfully joined a cluster. In a single node configuration, the node is
said to be ready, when it's able to accept requests.
To enable the readiness TCP port, use the `readiness.port` setting. The readiness service will bind to
all host addresses.
If the node leaves the cluster, or the <<put-shutdown,Shutdown API>> is used to mark the node
for shutdown, the readiness port is immediately closed.
A successful connection to the readiness TCP port signals that the {es} node is ready. When a client
connects to the readiness port, the server simply terminates the socket connection. No data is sent back
to the client. If a client cannot connect to the readiness port, the node is not ready.
[[gc-logging]]
include::important-settings/gc-logging.asciidoc[leveloffset=-1]
[[error-file-path]]
include::important-settings/error-file.asciidoc[leveloffset=-1]

View file

@ -19,10 +19,20 @@ of items which *must* be considered before using your cluster in production:
Our {ess-trial}[{ecloud}] service configures these items automatically, making Our {ess-trial}[{ecloud}] service configures these items automatically, making
your cluster production-ready by default. your cluster production-ready by default.
[[path-settings]]
[discrete]
==== Path settings
include::important-settings/path-settings.asciidoc[] include::important-settings/path-settings.asciidoc[]
Elasticsearch offers a deprecated setting that allows you to specify multiple paths in `path.data`.
To learn about this setting, and how to migrate away from it, refer to <<multiple-data-paths>>.
include::important-settings/cluster-name.asciidoc[] include::important-settings/cluster-name.asciidoc[]
[[node-name]]
[discrete]
==== Node name setting
include::important-settings/node-name.asciidoc[] include::important-settings/node-name.asciidoc[]
include::important-settings/network-host.asciidoc[] include::important-settings/network-host.asciidoc[]

View file

@ -1,4 +1,3 @@
[[cluster-name]]
[discrete] [discrete]
==== Cluster name setting ==== Cluster name setting

View file

@ -1,4 +1,3 @@
[[error-file-path]]
[discrete] [discrete]
==== JVM fatal error log setting ==== JVM fatal error log setting

View file

@ -1,4 +1,3 @@
[[gc-logging]]
[discrete] [discrete]
==== GC logging settings ==== GC logging settings
@ -20,9 +19,8 @@ To see further options not contained in the original JEP, see
https://docs.oracle.com/en/java/javase/13/docs/specs/man/java.html#enable-logging-with-the-jvm-unified-logging-framework[Enable https://docs.oracle.com/en/java/javase/13/docs/specs/man/java.html#enable-logging-with-the-jvm-unified-logging-framework[Enable
Logging with the JVM Unified Logging Framework]. Logging with the JVM Unified Logging Framework].
[[gc-logging-examples]]
[discrete] [discrete]
==== Examples ===== Examples
Change the default GC log output location to `/opt/my-app/gc.log` by Change the default GC log output location to `/opt/my-app/gc.log` by
creating `$ES_HOME/config/jvm.options.d/gc.options` with some sample creating `$ES_HOME/config/jvm.options.d/gc.options` with some sample

View file

@ -1,4 +1,3 @@
[[heap-dump-path]]
[discrete] [discrete]
==== JVM heap dump path setting ==== JVM heap dump path setting

View file

@ -1,7 +1,3 @@
[[node-name]]
[discrete]
==== Node name setting
{es} uses `node.name` as a human-readable identifier for a {es} uses `node.name` as a human-readable identifier for a
particular instance of {es}. This name is included in the response particular instance of {es}. This name is included in the response
of many APIs. The node name defaults to the hostname of the machine when of many APIs. The node name defaults to the hostname of the machine when

View file

@ -1,7 +1,3 @@
[[path-settings]]
[discrete]
==== Path settings
{es} writes the data you index to indices and data streams to a `data` {es} writes the data you index to indices and data streams to a `data`
directory. {es} writes its own application logs, which contain information about directory. {es} writes its own application logs, which contain information about
cluster health and operations, to a `logs` directory. cluster health and operations, to a `logs` directory.
@ -21,112 +17,3 @@ Supported `path.data` and `path.logs` values vary by platform:
include::{es-ref-dir}/tab-widgets/customize-data-log-path-widget.asciidoc[] include::{es-ref-dir}/tab-widgets/customize-data-log-path-widget.asciidoc[]
include::{es-ref-dir}/modules/node.asciidoc[tag=modules-node-data-path-warning-tag] include::{es-ref-dir}/modules/node.asciidoc[tag=modules-node-data-path-warning-tag]
[discrete]
==== Multiple data paths
deprecated::[7.13.0]
If needed, you can specify multiple paths in `path.data`. {es} stores the node's
data across all provided paths but keeps each shard's data on the same path.
{es} does not balance shards across a node's data paths. High disk
usage in a single path can trigger a <<disk-based-shard-allocation,high disk
usage watermark>> for the entire node. If triggered, {es} will not add shards to
the node, even if the nodes other paths have available disk space. If you need
additional disk space, we recommend you add a new node rather than additional
data paths.
include::{es-ref-dir}/tab-widgets/multi-data-path-widget.asciidoc[]
[discrete]
[[mdp-migrate]]
==== Migrate from multiple data paths
Support for multiple data paths was deprecated in 7.13 and will be removed
in a future release.
As an alternative to multiple data paths, you can create a filesystem which
spans multiple disks with a hardware virtualisation layer such as RAID, or a
software virtualisation layer such as Logical Volume Manager (LVM) on Linux or
Storage Spaces on Windows. If you wish to use multiple data paths on a single
machine then you must run one node for each data path.
If you currently use multiple data paths in a
{ref}/high-availability-cluster-design.html[highly available cluster] then you
can migrate to a setup that uses a single path for each node without downtime
using a process similar to a
{ref}/restart-cluster.html#restart-cluster-rolling[rolling restart]: shut each
node down in turn and replace it with one or more nodes each configured to use
a single data path. In more detail, for each node that currently has multiple
data paths you should follow the following process. In principle you can
perform this migration during a rolling upgrade to 8.0, but we recommend
migrating to a single-data-path setup before starting to upgrade.
1. Take a snapshot to protect your data in case of disaster.
2. Optionally, migrate the data away from the target node by using an
{ref}/modules-cluster.html#cluster-shard-allocation-filtering[allocation filter]:
+
[source,console]
--------------------------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._name": "target-node-name"
}
}
--------------------------------------------------
+
You can use the {ref}/cat-allocation.html[cat allocation API] to track progress
of this data migration. If some shards do not migrate then the
{ref}/cluster-allocation-explain.html[cluster allocation explain API] will help
you to determine why.
3. Follow the steps in the
{ref}/restart-cluster.html#restart-cluster-rolling[rolling restart process]
up to and including shutting the target node down.
4. Ensure your cluster health is `yellow` or `green`, so that there is a copy
of every shard assigned to at least one of the other nodes in your cluster.
5. If applicable, remove the allocation filter applied in the earlier step.
+
[source,console]
--------------------------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._name": null
}
}
--------------------------------------------------
6. Discard the data held by the stopped node by deleting the contents of its
data paths.
7. Reconfigure your storage. For instance, combine your disks into a single
filesystem using LVM or Storage Spaces. Ensure that your reconfigured storage
has sufficient space for the data that it will hold.
8. Reconfigure your node by adjusting the `path.data` setting in its
`elasticsearch.yml` file. If needed, install more nodes each with their own
`path.data` setting pointing at a separate data path.
9. Start the new nodes and follow the rest of the
{ref}/restart-cluster.html#restart-cluster-rolling[rolling restart process] for
them.
10. Ensure your cluster health is `green`, so that every shard has been
assigned.
You can alternatively add some number of single-data-path nodes to your
cluster, migrate all your data over to these new nodes using
{ref}/modules-cluster.html#cluster-shard-allocation-filtering[allocation filters],
and then remove the old nodes from the cluster. This approach will temporarily
double the size of your cluster so it will only work if you have the capacity to
expand your cluster like this.
If you currently use multiple data paths but your cluster is not highly
available then you can migrate to a non-deprecated configuration by taking
a snapshot, creating a new cluster with the desired configuration and restoring
the snapshot into it.

View file

@ -1,5 +1,5 @@
[[logging]] [[logging]]
=== Logging == Elasticsearch application logging
You can use {es}'s application logs to monitor your cluster and diagnose issues. You can use {es}'s application logs to monitor your cluster and diagnose issues.
If you run {es} as a service, the default location of the logs varies based on If you run {es} as a service, the default location of the logs varies based on
@ -11,7 +11,7 @@ If you run {es} from the command line, {es} prints logs to the standard output
(`stdout`). (`stdout`).
[discrete] [discrete]
[[loggin-configuration]] [[logging-configuration]]
=== Logging configuration === Logging configuration
IMPORTANT: Elastic strongly recommends using the Log4j 2 configuration that is shipped by default. IMPORTANT: Elastic strongly recommends using the Log4j 2 configuration that is shipped by default.
@ -304,6 +304,7 @@ The user ID is included in the `X-Opaque-ID` field in deprecation JSON logs.
Deprecation logs can be indexed into `.logs-deprecation.elasticsearch-default` data stream Deprecation logs can be indexed into `.logs-deprecation.elasticsearch-default` data stream
`cluster.deprecation_indexing.enabled` setting is set to true. `cluster.deprecation_indexing.enabled` setting is set to true.
[discrete]
==== Deprecation logs throttling ==== Deprecation logs throttling
:es-rate-limiting-filter-java-doc: {elasticsearch-javadoc}/org/elasticsearch/common/logging/RateLimitingFilter.html :es-rate-limiting-filter-java-doc: {elasticsearch-javadoc}/org/elasticsearch/common/logging/RateLimitingFilter.html
Deprecation logs are deduplicated based on a deprecated feature key Deprecation logs are deduplicated based on a deprecated feature key

View file

@ -0,0 +1,134 @@
[[shard-request-cache]]
=== The shard request cache
When a search request is run against an index or against many indices, each
involved shard executes the search locally and returns its local results to
the _coordinating node_, which combines these shard-level results into a
``global'' result set.
The shard-level request cache module caches the local results on each shard.
This allows frequently used (and potentially heavy) search requests to return
results almost instantly. The requests cache is a very good fit for the logging
use case, where only the most recent index is being actively updated --
results from older indices will be served directly from the cache.
You can control the size and expiration of the cache at the node level using the <<shard-request-cache-settings,shard request cache settings>>.
[IMPORTANT]
===================================
By default, the requests cache will only cache the results of search requests
where `size=0`, so it will not cache `hits`,
but it will cache `hits.total`, <<search-aggregations,aggregations>>, and
<<search-suggesters,suggestions>>.
Most queries that use `now` (see <<date-math>>) cannot be cached.
Scripted queries that use the API calls which are non-deterministic, such as
`Math.random()` or `new Date()` are not cached.
===================================
[discrete]
==== Cache invalidation
The cache is smart -- it keeps the same _near real-time_ promise as uncached
search.
Cached results are invalidated automatically whenever the shard refreshes to
pick up changes to the documents or when you update the mapping. In other
words you will always get the same results from the cache as you would for an
uncached search request.
The longer the refresh interval, the longer that cached entries will remain
valid even if there are changes to the documents. If the cache is full, the
least recently used cache keys will be evicted.
The cache can be expired manually with the <<indices-clearcache,`clear-cache` API>>:
[source,console]
------------------------
POST /my-index-000001,my-index-000002/_cache/clear?request=true
------------------------
// TEST[s/^/PUT my-index-000001\nPUT my-index-000002\n/]
[discrete]
==== Enabling and disabling caching
The cache is enabled by default, but can be disabled when creating a new
index as follows:
[source,console]
-----------------------------
PUT /my-index-000001
{
"settings": {
"index.requests.cache.enable": false
}
}
-----------------------------
It can also be enabled or disabled dynamically on an existing index with the
<<indices-update-settings,`update-settings`>> API:
[source,console]
-----------------------------
PUT /my-index-000001/_settings
{ "index.requests.cache.enable": true }
-----------------------------
// TEST[continued]
[discrete]
==== Enabling and disabling caching per request
The `request_cache` query-string parameter can be used to enable or disable
caching on a *per-request* basis. If set, it overrides the index-level setting:
[source,console]
-----------------------------
GET /my-index-000001/_search?request_cache=true
{
"size": 0,
"aggs": {
"popular_colors": {
"terms": {
"field": "colors"
}
}
}
}
-----------------------------
// TEST[continued]
Requests where `size` is greater than 0 will not be cached even if the request cache is
enabled in the index settings. To cache these requests you will need to use the
query-string parameter detailed here.
[discrete]
==== Cache key
A hash of the whole JSON body is used as the cache key. This means that if the JSON
changes -- for instance if keys are output in a different order -- then the
cache key will not be recognised.
TIP: Most JSON libraries support a _canonical_ mode which ensures that JSON
keys are always emitted in the same order. This canonical mode can be used in
the application to ensure that a request is always serialized in the same way.
[discrete]
==== Monitoring cache usage
The size of the cache (in bytes) and the number of evictions can be viewed
by index, with the <<indices-stats,`indices-stats`>> API:
[source,console]
------------------------
GET /_stats/request_cache?human
------------------------
or by node with the <<cluster-nodes-stats,`nodes-stats`>> API:
[source,console]
------------------------
GET /_nodes/stats/indices/request_cache?human
------------------------

View file

@ -75,7 +75,7 @@ POST /_snapshot/my_repository/my_snapshot/_restore
// tag::restore-prereqs[] // tag::restore-prereqs[]
* You can only restore a snapshot to a running cluster with an elected * You can only restore a snapshot to a running cluster with an elected
<<master-node,master node>>. The snapshot's repository must be <<master-node-role,master node>>. The snapshot's repository must be
<<snapshots-register-repository,registered>> and available to the cluster. <<snapshots-register-repository,registered>> and available to the cluster.
* The snapshot and cluster versions must be compatible. See * The snapshot and cluster versions must be compatible. See

View file

@ -46,7 +46,7 @@ taking snapshots at different time intervals.
include::register-repository.asciidoc[tag=kib-snapshot-prereqs] include::register-repository.asciidoc[tag=kib-snapshot-prereqs]
* You can only take a snapshot from a running cluster with an elected * You can only take a snapshot from a running cluster with an elected
<<master-node,master node>>. <<master-node-role,master node>>.
* A snapshot repository must be <<snapshots-register-repository,registered>> and * A snapshot repository must be <<snapshots-register-repository,registered>> and
available to the cluster. available to the cluster.

View file

@ -11,7 +11,7 @@
To use {transforms}, you must have: To use {transforms}, you must have:
* at least one <<transform-node,{transform} node>>, * at least one <<transform-node-role,{transform} node>>,
* management features visible in the {kib} space, and * management features visible in the {kib} space, and
* security privileges that: * security privileges that:
+ +

View file

@ -5,7 +5,7 @@ starting to replicate the shards on that node to other nodes in the cluster,
which can involve a lot of I/O. Since the node is shortly going to be which can involve a lot of I/O. Since the node is shortly going to be
restarted, this I/O is unnecessary. You can avoid racing the clock by restarted, this I/O is unnecessary. You can avoid racing the clock by
<<cluster-routing-allocation-enable,disabling allocation>> of replicas before <<cluster-routing-allocation-enable,disabling allocation>> of replicas before
shutting down <<data-node,data nodes>>: shutting down <<data-node-role,data nodes>>:
[source,console] [source,console]
-------------------------------------------------- --------------------------------------------------