Add to allocation architecture guide (#125328)

How master and data nodes communicate
about shard allocation
This commit is contained in:
Dianna Hohensee 2025-04-18 14:56:27 -04:00 committed by GitHub
parent 26254e3f42
commit 72b4ed255b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -195,6 +195,18 @@ works in parallel with the storage engine.)
# Allocation
### Indexes and Shards
Each index consists of a fixed number of primary shards. The number of primary shards cannot be changed for the lifetime of the index. Each
primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed dynamically.
The allocation assignment status of each shard copy is tracked by its [ShardRoutingState][]. The `RoutingTable` and `RoutingNodes` objects
are responsible for tracking the data nodes to which each shard in the cluster is allocated: see the [routing package javadoc][] for more
details about these structures.
[routing package javadoc]: https://github.com/elastic/elasticsearch/blob/v9.0.0-beta1/server/src/main/java/org/elasticsearch/cluster/routing/package-info.java
[ShardRoutingState]: https://github.com/elastic/elasticsearch/blob/4c9c82418ed98613edcd91e4d8f818eeec73ce92/server/src/main/java/org/elasticsearch/cluster/routing/ShardRoutingState.java#L12-L46
### Core Components
The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce
@ -235,6 +247,22 @@ of shards, and an incentive to distribute shards within the same index across di
`NodeAllocationStatsAndWeightsCalculator` classes for more details on the weight calculations that support the `DesiredBalanceComputer`
decisions.
### Inter-Node Communicaton
The elected master node creates a shard allocation plan with the `DesiredBalanceShardsAllocator` and then selects incremental shard
movements towards the target allocation plan with the `DesiredBalanceReconciler`. The results of the `DesiredBalanceReconciler` is an
updated `RoutingTable`. The `RoutingTable` is part of the cluster state, so the master node updates the cluster state with the new
(incremental) desired shard allocation information. The updated cluster state is then published to the data nodes. Each data node will
observe any change in shard allocation related to itself and take action to achieve the new shard allocation by: initiating creation of a
new empty shard; starting recovery (copying) of an existing shard from another data node; or removing a shard. When the data node finishes
a shard change, a request is sent to the master node to update the shard as having finished recovery/removal in the cluster state. The
cluster state is used by allocation as a fancy work queue: the master node conveys new work to the data nodes, which pick up the work and
report back when done.
- See `DesiredBalanceShardsAllocator#submitReconcileTask` for the master node's cluster state update post-reconciliation.
- See `IndicesClusterStateService#doApplyClusterState` for the data node hook to observe shard changes in the cluster state.
- See `ShardStateAction#sendShardAction` for the data node request to the master node on completion of a shard state change.
# Autoscaling
The Autoscaling API in ES (Elasticsearch) uses cluster and node level statistics to provide a recommendation