Add to allocation architecture guide (#125328)

How master and data nodes communicate about shard allocation
2025-06-28 09:28:55 -04:00 · 2025-04-18 14:56:27 -04:00 · 2025-04-18 14:56:27 -04:00 · 72b4ed255b
commit 72b4ed255b
parent 26254e3f42
1 changed files with 28 additions and 0 deletions
--- a/docs/internal/DistributedArchitectureGuide.md
+++ b/docs/internal/DistributedArchitectureGuide.md
@ -195,6 +195,18 @@ works in parallel with the storage engine.)

 # Allocation

+### Indexes and Shards
+
+Each index consists of a fixed number of primary shards. The number of primary shards cannot be changed for the lifetime of the index. Each
+primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed dynamically.
+
+The allocation assignment status of each shard copy is tracked by its [ShardRoutingState][]. The `RoutingTable` and `RoutingNodes` objects
+are responsible for tracking the data nodes to which each shard in the cluster is allocated: see the [routing package javadoc][] for more
+details about these structures.
+
+[routing package javadoc]: https://github.com/elastic/elasticsearch/blob/v9.0.0-beta1/server/src/main/java/org/elasticsearch/cluster/routing/package-info.java
+[ShardRoutingState]: https://github.com/elastic/elasticsearch/blob/4c9c82418ed98613edcd91e4d8f818eeec73ce92/server/src/main/java/org/elasticsearch/cluster/routing/ShardRoutingState.java#L12-L46
+
 ### Core Components

 The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce
@ -235,6 +247,22 @@ of shards, and an incentive to distribute shards within the same index across di
 `NodeAllocationStatsAndWeightsCalculator` classes for more details on the weight calculations that support the `DesiredBalanceComputer`
 decisions.

+### Inter-Node Communicaton
+
+The elected master node creates a shard allocation plan with the `DesiredBalanceShardsAllocator` and then selects incremental shard
+movements towards the target allocation plan with the `DesiredBalanceReconciler`. The results of the `DesiredBalanceReconciler` is an
+updated `RoutingTable`. The `RoutingTable` is part of the cluster state, so the master node updates the cluster state with the new
+(incremental) desired shard allocation information. The updated cluster state is then published to the data nodes. Each data node will
+observe any change in shard allocation related to itself and take action to achieve the new shard allocation by: initiating creation of a
+new empty shard; starting recovery (copying) of an existing shard from another data node; or removing a shard. When the data node finishes
+a shard change, a request is sent to the master node to update the shard as having finished recovery/removal in the cluster state. The
+cluster state is used by allocation as a fancy work queue: the master node conveys new work to the data nodes, which pick up the work and
+report back when done.
+
+- See `DesiredBalanceShardsAllocator#submitReconcileTask` for the master node's cluster state update post-reconciliation.
+- See `IndicesClusterStateService#doApplyClusterState` for the data node hook to observe shard changes in the cluster state.
+- See `ShardStateAction#sendShardAction` for the data node request to the master node on completion of a shard state change.
+
 # Autoscaling

 The Autoscaling API in ES (Elasticsearch) uses cluster and node level statistics to provide a recommendation