From 72b4ed255bfaba98a615d57bade1e26d32094f54 Mon Sep 17 00:00:00 2001 From: Dianna Hohensee Date: Fri, 18 Apr 2025 14:56:27 -0400 Subject: [PATCH] Add to allocation architecture guide (#125328) How master and data nodes communicate about shard allocation --- docs/internal/DistributedArchitectureGuide.md | 28 +++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/docs/internal/DistributedArchitectureGuide.md b/docs/internal/DistributedArchitectureGuide.md index d2c5536ac440..1fa0d267f2ef 100644 --- a/docs/internal/DistributedArchitectureGuide.md +++ b/docs/internal/DistributedArchitectureGuide.md @@ -195,6 +195,18 @@ works in parallel with the storage engine.) # Allocation +### Indexes and Shards + +Each index consists of a fixed number of primary shards. The number of primary shards cannot be changed for the lifetime of the index. Each +primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed dynamically. + +The allocation assignment status of each shard copy is tracked by its [ShardRoutingState][]. The `RoutingTable` and `RoutingNodes` objects +are responsible for tracking the data nodes to which each shard in the cluster is allocated: see the [routing package javadoc][] for more +details about these structures. + +[routing package javadoc]: https://github.com/elastic/elasticsearch/blob/v9.0.0-beta1/server/src/main/java/org/elasticsearch/cluster/routing/package-info.java +[ShardRoutingState]: https://github.com/elastic/elasticsearch/blob/4c9c82418ed98613edcd91e4d8f818eeec73ce92/server/src/main/java/org/elasticsearch/cluster/routing/ShardRoutingState.java#L12-L46 + ### Core Components The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce @@ -235,6 +247,22 @@ of shards, and an incentive to distribute shards within the same index across di `NodeAllocationStatsAndWeightsCalculator` classes for more details on the weight calculations that support the `DesiredBalanceComputer` decisions. +### Inter-Node Communicaton + +The elected master node creates a shard allocation plan with the `DesiredBalanceShardsAllocator` and then selects incremental shard +movements towards the target allocation plan with the `DesiredBalanceReconciler`. The results of the `DesiredBalanceReconciler` is an +updated `RoutingTable`. The `RoutingTable` is part of the cluster state, so the master node updates the cluster state with the new +(incremental) desired shard allocation information. The updated cluster state is then published to the data nodes. Each data node will +observe any change in shard allocation related to itself and take action to achieve the new shard allocation by: initiating creation of a +new empty shard; starting recovery (copying) of an existing shard from another data node; or removing a shard. When the data node finishes +a shard change, a request is sent to the master node to update the shard as having finished recovery/removal in the cluster state. The +cluster state is used by allocation as a fancy work queue: the master node conveys new work to the data nodes, which pick up the work and +report back when done. + +- See `DesiredBalanceShardsAllocator#submitReconcileTask` for the master node's cluster state update post-reconciliation. +- See `IndicesClusterStateService#doApplyClusterState` for the data node hook to observe shard changes in the cluster state. +- See `ShardStateAction#sendShardAction` for the data node request to the master node on completion of a shard state change. + # Autoscaling The Autoscaling API in ES (Elasticsearch) uses cluster and node level statistics to provide a recommendation