mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 23:27:25 -04:00
Weaken language about "low-latency" networks (#89198)
Today we say that voting-only nodes require a "low-latency" network. This term has a specific meaning in some operating environments which is different from our intended meaning. To avoid this confusion this commit removes the absolute term "low-latency" in favour of describing the requirements relative to the user's own performance goals.
This commit is contained in:
parent
9dd47d8a92
commit
c9d4892929
3 changed files with 66 additions and 46 deletions
|
@ -338,12 +338,16 @@ You should use <<allocation-awareness,shard allocation awareness>> to ensure
|
||||||
that there is a copy of each shard in each zone. This means either zone remains
|
that there is a copy of each shard in each zone. This means either zone remains
|
||||||
fully available if the other zone fails.
|
fully available if the other zone fails.
|
||||||
|
|
||||||
All master-eligible nodes, including voting-only nodes, are on the critical path
|
All master-eligible nodes, including voting-only nodes, are on the critical
|
||||||
for publishing cluster state updates. Because of this, these nodes require
|
path for <<cluster-state-publishing,publishing cluster state updates>>. Cluster
|
||||||
reasonably fast persistent storage and a reliable, low-latency network
|
state updates are usually independent of performance-critical workloads such as
|
||||||
connection to the rest of the cluster. If you add a tiebreaker node in a third
|
indexing or searches, but they are involved in management activities such as
|
||||||
independent zone then you must make sure it has adequate resources and good
|
index creation and rollover, mapping updates, and recovery after a failure. The
|
||||||
connectivity to the rest of the cluster.
|
performance characteristics of these activities are a function of the speed of
|
||||||
|
the storage on each master-eligible node, as well as the reliability and
|
||||||
|
latency of the network interconnections between all nodes in the cluster. You
|
||||||
|
must therefore ensure that the storage and networking available to the
|
||||||
|
nodes in your cluster are good enough to meet your performance goals.
|
||||||
|
|
||||||
[[high-availability-cluster-design-three-zones]]
|
[[high-availability-cluster-design-three-zones]]
|
||||||
==== Clusters with three or more zones
|
==== Clusters with three or more zones
|
||||||
|
|
|
@ -1,38 +1,40 @@
|
||||||
[[cluster-state-publishing]]
|
[[cluster-state-publishing]]
|
||||||
=== Publishing the cluster state
|
=== Publishing the cluster state
|
||||||
|
|
||||||
The master node is the only node in a cluster that can make changes to the
|
The elected master node is the only node in a cluster that can make changes to
|
||||||
cluster state. The master node processes one batch of cluster state updates at
|
the cluster state. The elected master node processes one batch of cluster state
|
||||||
a time, computing the required changes and publishing the updated cluster state
|
updates at a time, computing the required changes and publishing the updated
|
||||||
to all the other nodes in the cluster. Each publication starts with the master
|
cluster state to all the other nodes in the cluster. Each publication starts
|
||||||
broadcasting the updated cluster state to all nodes in the cluster. Each node
|
with the elected master broadcasting the updated cluster state to all nodes in
|
||||||
responds with an acknowledgement but does not yet apply the newly-received
|
the cluster. Each node responds with an acknowledgement but does not yet apply
|
||||||
state. Once the master has collected acknowledgements from enough
|
the newly-received state. Once the elected master has collected
|
||||||
master-eligible nodes, the new cluster state is said to be _committed_ and the
|
acknowledgements from enough master-eligible nodes, the new cluster state is
|
||||||
master broadcasts another message instructing nodes to apply the now-committed
|
said to be _committed_ and the master broadcasts another message instructing
|
||||||
state. Each node receives this message, applies the updated state, and then
|
nodes to apply the now-committed state. Each node receives this message,
|
||||||
sends a second acknowledgement back to the master.
|
applies the updated state, and then sends a second acknowledgement back to the
|
||||||
|
master.
|
||||||
|
|
||||||
The master allows a limited amount of time for each cluster state update to be
|
The elected master allows a limited amount of time for each cluster state
|
||||||
completely published to all nodes. It is defined by the
|
update to be completely published to all nodes. It is defined by the
|
||||||
`cluster.publish.timeout` setting, which defaults to `30s`, measured from the
|
`cluster.publish.timeout` setting, which defaults to `30s`, measured from the
|
||||||
time the publication started. If this time is reached before the new cluster
|
time the publication started. If this time is reached before the new cluster
|
||||||
state is committed then the cluster state change is rejected and the master
|
state is committed then the cluster state change is rejected and the elected
|
||||||
considers itself to have failed. It stands down and starts trying to elect a
|
master considers itself to have failed. It stands down and starts trying to
|
||||||
new master.
|
elect a new master node.
|
||||||
|
|
||||||
If the new cluster state is committed before `cluster.publish.timeout` has
|
If the new cluster state is committed before `cluster.publish.timeout` has
|
||||||
elapsed, the master node considers the change to have succeeded. It waits until
|
elapsed, the elected master node considers the change to have succeeded. It
|
||||||
the timeout has elapsed or until it has received acknowledgements that each
|
waits until the timeout has elapsed or until it has received acknowledgements
|
||||||
node in the cluster has applied the updated state, and then starts processing
|
that each node in the cluster has applied the updated state, and then starts
|
||||||
and publishing the next cluster state update. If some acknowledgements have not
|
processing and publishing the next cluster state update. If some
|
||||||
been received (i.e. some nodes have not yet confirmed that they have applied
|
acknowledgements have not been received (i.e. some nodes have not yet confirmed
|
||||||
the current update), these nodes are said to be _lagging_ since their cluster
|
that they have applied the current update), these nodes are said to be
|
||||||
states have fallen behind the master's latest state. The master waits for the
|
_lagging_ since their cluster states have fallen behind the elected master's
|
||||||
lagging nodes to catch up for a further time, `cluster.follower_lag.timeout`,
|
latest state. The elected master waits for the lagging nodes to catch up for a
|
||||||
which defaults to `90s`. If a node has still not successfully applied the
|
further time, `cluster.follower_lag.timeout`, which defaults to `90s`. If a
|
||||||
cluster state update within this time then it is considered to have failed and
|
node has still not successfully applied the cluster state update within this
|
||||||
is removed from the cluster.
|
time then it is considered to have failed and the elected master removes it
|
||||||
|
from the cluster.
|
||||||
|
|
||||||
Cluster state updates are typically published as diffs to the previous cluster
|
Cluster state updates are typically published as diffs to the previous cluster
|
||||||
state, which reduces the time and network bandwidth needed to publish a cluster
|
state, which reduces the time and network bandwidth needed to publish a cluster
|
||||||
|
@ -40,12 +42,19 @@ state update. For example, when updating the mappings for only a subset of the
|
||||||
indices in the cluster state, only the updates for those indices need to be
|
indices in the cluster state, only the updates for those indices need to be
|
||||||
published to the nodes in the cluster, as long as those nodes have the previous
|
published to the nodes in the cluster, as long as those nodes have the previous
|
||||||
cluster state. If a node is missing the previous cluster state, for example
|
cluster state. If a node is missing the previous cluster state, for example
|
||||||
when rejoining a cluster, the master will publish the full cluster state to
|
when rejoining a cluster, the elected master will publish the full cluster
|
||||||
that node so that it can receive future updates as diffs.
|
state to that node so that it can receive future updates as diffs.
|
||||||
|
|
||||||
NOTE: {es} is a peer to peer based system, in which nodes communicate with one
|
NOTE: {es} is a peer to peer based system, in which nodes communicate with one
|
||||||
another directly. The high-throughput APIs (index, delete, search) do not
|
another directly. The high-throughput APIs (index, delete, search) do not
|
||||||
normally interact with the master node. The responsibility of the master node
|
normally interact with the elected master node. The responsibility of the
|
||||||
is to maintain the global cluster state and reassign shards when nodes join or
|
elected master node is to maintain the global cluster state which includes
|
||||||
leave the cluster. Each time the cluster state is changed, the new state is
|
reassigning shards when nodes join or leave the cluster. Each time the cluster
|
||||||
published to all nodes in the cluster as described above.
|
state is changed, the new state is published to all nodes in the cluster as
|
||||||
|
described above.
|
||||||
|
|
||||||
|
The performance characteristics of cluster state updates are a function of the
|
||||||
|
speed of the storage on each master-eligible node, as well as the reliability
|
||||||
|
and latency of the network interconnections between all nodes in the cluster.
|
||||||
|
You must therefore ensure that the storage and networking available to the
|
||||||
|
nodes in your cluster are good enough to meet your performance goals.
|
||||||
|
|
|
@ -194,13 +194,6 @@ High availability (HA) clusters require at least three master-eligible nodes, at
|
||||||
least two of which are not voting-only nodes. Such a cluster will be able to
|
least two of which are not voting-only nodes. Such a cluster will be able to
|
||||||
elect a master node even if one of the nodes fails.
|
elect a master node even if one of the nodes fails.
|
||||||
|
|
||||||
Since voting-only nodes never act as the cluster's elected master, they may
|
|
||||||
require less heap and a less powerful CPU than the true master nodes.
|
|
||||||
However all master-eligible nodes, including voting-only nodes, require
|
|
||||||
reasonably fast persistent storage and a reliable and low-latency network
|
|
||||||
connection to the rest of the cluster, since they are on the critical path for
|
|
||||||
<<cluster-state-publishing,publishing cluster state updates>>.
|
|
||||||
|
|
||||||
Voting-only master-eligible nodes may also fill other roles in your cluster.
|
Voting-only master-eligible nodes may also fill other roles in your cluster.
|
||||||
For instance, a node may be both a data node and a voting-only master-eligible
|
For instance, a node may be both a data node and a voting-only master-eligible
|
||||||
node. A _dedicated_ voting-only master-eligible nodes is a voting-only
|
node. A _dedicated_ voting-only master-eligible nodes is a voting-only
|
||||||
|
@ -212,6 +205,20 @@ dedicated voting-only master-eligible node, set:
|
||||||
node.roles: [ master, voting_only ]
|
node.roles: [ master, voting_only ]
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
|
Since dedicated voting-only nodes never act as the cluster's elected master,
|
||||||
|
they may require less heap and a less powerful CPU than the true master nodes.
|
||||||
|
However all master-eligible nodes, including voting-only nodes, are on the
|
||||||
|
critical path for <<cluster-state-publishing,publishing cluster state
|
||||||
|
updates>>. Cluster state updates are usually independent of
|
||||||
|
performance-critical workloads such as indexing or searches, but they are
|
||||||
|
involved in management activities such as index creation and rollover, mapping
|
||||||
|
updates, and recovery after a failure. The performance characteristics of these
|
||||||
|
activities are a function of the speed of the storage on each master-eligible
|
||||||
|
node, as well as the reliability and latency of the network interconnections
|
||||||
|
between the elected master node and the other nodes in the cluster. You must
|
||||||
|
therefore ensure that the storage and networking available to the nodes in your
|
||||||
|
cluster are good enough to meet your performance goals.
|
||||||
|
|
||||||
[[data-node]]
|
[[data-node]]
|
||||||
==== Data node
|
==== Data node
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue