From 72b4ed255bfaba98a615d57bade1e26d32094f54 Mon Sep 17 00:00:00 2001
From: Dianna Hohensee <dianna.hohensee@elastic.co>
Date: Fri, 18 Apr 2025 14:56:27 -0400
Subject: [PATCH] Add to allocation architecture guide (#125328)

How master and data nodes communicate
about shard allocation
---
 docs/internal/DistributedArchitectureGuide.md | 28 +++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/docs/internal/DistributedArchitectureGuide.md b/docs/internal/DistributedArchitectureGuide.md
index d2c5536ac440..1fa0d267f2ef 100644
--- a/docs/internal/DistributedArchitectureGuide.md
+++ b/docs/internal/DistributedArchitectureGuide.md
@@ -195,6 +195,18 @@ works in parallel with the storage engine.)
 
 # Allocation
 
+### Indexes and Shards
+
+Each index consists of a fixed number of primary shards. The number of primary shards cannot be changed for the lifetime of the index. Each
+primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed dynamically.
+
+The allocation assignment status of each shard copy is tracked by its [ShardRoutingState][]. The `RoutingTable` and `RoutingNodes` objects
+are responsible for tracking the data nodes to which each shard in the cluster is allocated: see the [routing package javadoc][] for more
+details about these structures.
+
+[routing package javadoc]: https://github.com/elastic/elasticsearch/blob/v9.0.0-beta1/server/src/main/java/org/elasticsearch/cluster/routing/package-info.java
+[ShardRoutingState]: https://github.com/elastic/elasticsearch/blob/4c9c82418ed98613edcd91e4d8f818eeec73ce92/server/src/main/java/org/elasticsearch/cluster/routing/ShardRoutingState.java#L12-L46
+
 ### Core Components
 
 The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce
@@ -235,6 +247,22 @@ of shards, and an incentive to distribute shards within the same index across di
 `NodeAllocationStatsAndWeightsCalculator` classes for more details on the weight calculations that support the `DesiredBalanceComputer`
 decisions.
 
+### Inter-Node Communicaton
+
+The elected master node creates a shard allocation plan with the `DesiredBalanceShardsAllocator` and then selects incremental shard
+movements towards the target allocation plan with the `DesiredBalanceReconciler`. The results of the `DesiredBalanceReconciler` is an
+updated `RoutingTable`. The `RoutingTable` is part of the cluster state, so the master node updates the cluster state with the new
+(incremental) desired shard allocation information. The updated cluster state is then published to the data nodes. Each data node will
+observe any change in shard allocation related to itself and take action to achieve the new shard allocation by: initiating creation of a
+new empty shard; starting recovery (copying) of an existing shard from another data node; or removing a shard. When the data node finishes
+a shard change, a request is sent to the master node to update the shard as having finished recovery/removal in the cluster state. The
+cluster state is used by allocation as a fancy work queue: the master node conveys new work to the data nodes, which pick up the work and
+report back when done.
+
+- See `DesiredBalanceShardsAllocator#submitReconcileTask` for the master node's cluster state update post-reconciliation.
+- See `IndicesClusterStateService#doApplyClusterState` for the data node hook to observe shard changes in the cluster state.
+- See `ShardStateAction#sendShardAction` for the data node request to the master node on completion of a shard state change.
+
 # Autoscaling
 
 The Autoscaling API in ES (Elasticsearch) uses cluster and node level statistics to provide a recommendation