Weaken language about "low-latency" networks (#89198)

Today we say that voting-only nodes require a "low-latency" network.
This term has a specific meaning in some operating environments which is
different from our intended meaning. To avoid this confusion this commit
removes the absolute term "low-latency" in favour of describing the
requirements relative to the user's own performance goals.
This commit is contained in:
David Turner 2022-08-09 13:15:37 +01:00 committed by GitHub
parent 9dd47d8a92
commit c9d4892929
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
3 changed files with 66 additions and 46 deletions

View file

@ -338,12 +338,16 @@ You should use <<allocation-awareness,shard allocation awareness>> to ensure
that there is a copy of each shard in each zone. This means either zone remains that there is a copy of each shard in each zone. This means either zone remains
fully available if the other zone fails. fully available if the other zone fails.
All master-eligible nodes, including voting-only nodes, are on the critical path All master-eligible nodes, including voting-only nodes, are on the critical
for publishing cluster state updates. Because of this, these nodes require path for <<cluster-state-publishing,publishing cluster state updates>>. Cluster
reasonably fast persistent storage and a reliable, low-latency network state updates are usually independent of performance-critical workloads such as
connection to the rest of the cluster. If you add a tiebreaker node in a third indexing or searches, but they are involved in management activities such as
independent zone then you must make sure it has adequate resources and good index creation and rollover, mapping updates, and recovery after a failure. The
connectivity to the rest of the cluster. performance characteristics of these activities are a function of the speed of
the storage on each master-eligible node, as well as the reliability and
latency of the network interconnections between all nodes in the cluster. You
must therefore ensure that the storage and networking available to the
nodes in your cluster are good enough to meet your performance goals.
[[high-availability-cluster-design-three-zones]] [[high-availability-cluster-design-three-zones]]
==== Clusters with three or more zones ==== Clusters with three or more zones

View file

@ -1,38 +1,40 @@
[[cluster-state-publishing]] [[cluster-state-publishing]]
=== Publishing the cluster state === Publishing the cluster state
The master node is the only node in a cluster that can make changes to the The elected master node is the only node in a cluster that can make changes to
cluster state. The master node processes one batch of cluster state updates at the cluster state. The elected master node processes one batch of cluster state
a time, computing the required changes and publishing the updated cluster state updates at a time, computing the required changes and publishing the updated
to all the other nodes in the cluster. Each publication starts with the master cluster state to all the other nodes in the cluster. Each publication starts
broadcasting the updated cluster state to all nodes in the cluster. Each node with the elected master broadcasting the updated cluster state to all nodes in
responds with an acknowledgement but does not yet apply the newly-received the cluster. Each node responds with an acknowledgement but does not yet apply
state. Once the master has collected acknowledgements from enough the newly-received state. Once the elected master has collected
master-eligible nodes, the new cluster state is said to be _committed_ and the acknowledgements from enough master-eligible nodes, the new cluster state is
master broadcasts another message instructing nodes to apply the now-committed said to be _committed_ and the master broadcasts another message instructing
state. Each node receives this message, applies the updated state, and then nodes to apply the now-committed state. Each node receives this message,
sends a second acknowledgement back to the master. applies the updated state, and then sends a second acknowledgement back to the
master.
The master allows a limited amount of time for each cluster state update to be The elected master allows a limited amount of time for each cluster state
completely published to all nodes. It is defined by the update to be completely published to all nodes. It is defined by the
`cluster.publish.timeout` setting, which defaults to `30s`, measured from the `cluster.publish.timeout` setting, which defaults to `30s`, measured from the
time the publication started. If this time is reached before the new cluster time the publication started. If this time is reached before the new cluster
state is committed then the cluster state change is rejected and the master state is committed then the cluster state change is rejected and the elected
considers itself to have failed. It stands down and starts trying to elect a master considers itself to have failed. It stands down and starts trying to
new master. elect a new master node.
If the new cluster state is committed before `cluster.publish.timeout` has If the new cluster state is committed before `cluster.publish.timeout` has
elapsed, the master node considers the change to have succeeded. It waits until elapsed, the elected master node considers the change to have succeeded. It
the timeout has elapsed or until it has received acknowledgements that each waits until the timeout has elapsed or until it has received acknowledgements
node in the cluster has applied the updated state, and then starts processing that each node in the cluster has applied the updated state, and then starts
and publishing the next cluster state update. If some acknowledgements have not processing and publishing the next cluster state update. If some
been received (i.e. some nodes have not yet confirmed that they have applied acknowledgements have not been received (i.e. some nodes have not yet confirmed
the current update), these nodes are said to be _lagging_ since their cluster that they have applied the current update), these nodes are said to be
states have fallen behind the master's latest state. The master waits for the _lagging_ since their cluster states have fallen behind the elected master's
lagging nodes to catch up for a further time, `cluster.follower_lag.timeout`, latest state. The elected master waits for the lagging nodes to catch up for a
which defaults to `90s`. If a node has still not successfully applied the further time, `cluster.follower_lag.timeout`, which defaults to `90s`. If a
cluster state update within this time then it is considered to have failed and node has still not successfully applied the cluster state update within this
is removed from the cluster. time then it is considered to have failed and the elected master removes it
from the cluster.
Cluster state updates are typically published as diffs to the previous cluster Cluster state updates are typically published as diffs to the previous cluster
state, which reduces the time and network bandwidth needed to publish a cluster state, which reduces the time and network bandwidth needed to publish a cluster
@ -40,12 +42,19 @@ state update. For example, when updating the mappings for only a subset of the
indices in the cluster state, only the updates for those indices need to be indices in the cluster state, only the updates for those indices need to be
published to the nodes in the cluster, as long as those nodes have the previous published to the nodes in the cluster, as long as those nodes have the previous
cluster state. If a node is missing the previous cluster state, for example cluster state. If a node is missing the previous cluster state, for example
when rejoining a cluster, the master will publish the full cluster state to when rejoining a cluster, the elected master will publish the full cluster
that node so that it can receive future updates as diffs. state to that node so that it can receive future updates as diffs.
NOTE: {es} is a peer to peer based system, in which nodes communicate with one NOTE: {es} is a peer to peer based system, in which nodes communicate with one
another directly. The high-throughput APIs (index, delete, search) do not another directly. The high-throughput APIs (index, delete, search) do not
normally interact with the master node. The responsibility of the master node normally interact with the elected master node. The responsibility of the
is to maintain the global cluster state and reassign shards when nodes join or elected master node is to maintain the global cluster state which includes
leave the cluster. Each time the cluster state is changed, the new state is reassigning shards when nodes join or leave the cluster. Each time the cluster
published to all nodes in the cluster as described above. state is changed, the new state is published to all nodes in the cluster as
described above.
The performance characteristics of cluster state updates are a function of the
speed of the storage on each master-eligible node, as well as the reliability
and latency of the network interconnections between all nodes in the cluster.
You must therefore ensure that the storage and networking available to the
nodes in your cluster are good enough to meet your performance goals.

View file

@ -194,13 +194,6 @@ High availability (HA) clusters require at least three master-eligible nodes, at
least two of which are not voting-only nodes. Such a cluster will be able to least two of which are not voting-only nodes. Such a cluster will be able to
elect a master node even if one of the nodes fails. elect a master node even if one of the nodes fails.
Since voting-only nodes never act as the cluster's elected master, they may
require less heap and a less powerful CPU than the true master nodes.
However all master-eligible nodes, including voting-only nodes, require
reasonably fast persistent storage and a reliable and low-latency network
connection to the rest of the cluster, since they are on the critical path for
<<cluster-state-publishing,publishing cluster state updates>>.
Voting-only master-eligible nodes may also fill other roles in your cluster. Voting-only master-eligible nodes may also fill other roles in your cluster.
For instance, a node may be both a data node and a voting-only master-eligible For instance, a node may be both a data node and a voting-only master-eligible
node. A _dedicated_ voting-only master-eligible nodes is a voting-only node. A _dedicated_ voting-only master-eligible nodes is a voting-only
@ -212,6 +205,20 @@ dedicated voting-only master-eligible node, set:
node.roles: [ master, voting_only ] node.roles: [ master, voting_only ]
------------------- -------------------
Since dedicated voting-only nodes never act as the cluster's elected master,
they may require less heap and a less powerful CPU than the true master nodes.
However all master-eligible nodes, including voting-only nodes, are on the
critical path for <<cluster-state-publishing,publishing cluster state
updates>>. Cluster state updates are usually independent of
performance-critical workloads such as indexing or searches, but they are
involved in management activities such as index creation and rollover, mapping
updates, and recovery after a failure. The performance characteristics of these
activities are a function of the speed of the storage on each master-eligible
node, as well as the reliability and latency of the network interconnections
between the elected master node and the other nodes in the cluster. You must
therefore ensure that the storage and networking available to the nodes in your
cluster are good enough to meet your performance goals.
[[data-node]] [[data-node]]
==== Data node ==== Data node