[DOCS] Address local vs. remote storage + shard limits feedback (#109360)

2025-06-28 09:28:55 -04:00 · 2024-06-12 13:50:23 -04:00 · 2024-06-12 13:50:23 -04:00 · 900eb82c99
commit 900eb82c99
parent 47edae4fbd
7 changed files with 40 additions and 26 deletions
--- a/docs/reference/datatiers.asciidoc
+++ b/docs/reference/datatiers.asciidoc
@ -22,6 +22,9 @@ mounted indices>> of <<ilm-searchable-snapshot,{search-snaps}>> exclusively.
 This extends the storage capacity even further — by up to 20 times compared to
 the warm tier. 

+TIP: The performance of an {es} node is often limited by the performance of the underlying storage. 
+Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and <<search-use-faster-hardware,search>>.
+
 IMPORTANT: {es} generally expects nodes within a data tier to share the same 
 hardware profile. Variations not following this recommendation should be 
 carefully architected to avoid <<hotspotting,hot spotting>>.
--- a/docs/reference/how-to/indexing-speed.asciidoc
+++ b/docs/reference/how-to/indexing-speed.asciidoc
@ -94,6 +94,7 @@ auto-generated ids, Elasticsearch can skip this check, which makes indexing
 faster.

 [discrete]
+[[indexing-use-faster-hardware]]
 === Use faster hardware

 If indexing is I/O-bound, consider increasing the size of the filesystem cache
@ -110,13 +111,10 @@ different nodes so there's redundancy for any node failures. You can also use
 <<snapshot-restore,snapshot and restore>> to backup the index for further
 insurance.

-Directly-attached (local) storage generally performs better than remote storage
-because it is simpler to configure well and avoids communications overheads.
-With careful tuning it is sometimes possible to achieve acceptable performance
-using remote storage too. Benchmark your system with a realistic workload to
-determine the effects of any tuning parameters. If you cannot achieve the
-performance you expect, work with the vendor of your storage system to identify
-the problem.
+[discrete]
+==== Local vs.remote storage
+
+include::./remote-storage.asciidoc[]

 [discrete]
 === Indexing buffer size
--- a/docs/reference/how-to/remote-storage.asciidoc
+++ b/docs/reference/how-to/remote-storage.asciidoc
@ -0,0 +1,11 @@
+Directly-attached (local) storage generally performs 
+better than remote storage because it is simpler to configure well and avoids 
+communications overheads.
+
+Some remote storage performs very poorly, especially 
+under the kind of load that {es} imposes. However, with careful tuning, it is 
+sometimes possible to achieve acceptable performance using remote storage too. 
+Before committing to a particular storage architecture, benchmark your system 
+with a realistic workload to determine the effects of any tuning parameters. If 
+you cannot achieve the performance you expect, work with the vendor of your 
+storage system to identify the problem.
--- a/docs/reference/how-to/search-speed.asciidoc
+++ b/docs/reference/how-to/search-speed.asciidoc
@ -38,6 +38,7 @@ for `/dev/nvme0n1`, specify `blockdev --setra 256 /dev/nvme0n1`.
 // end::readahead[]

 [discrete]
+[[search-use-faster-hardware]]
 === Use faster hardware

 If your searches are I/O-bound, consider increasing the size of the filesystem
@ -46,16 +47,13 @@ sequential and random reads across multiple files, and there may be many
 searches running concurrently on each shard, so SSD drives tend to perform
 better than spinning disks.

-Directly-attached (local) storage generally performs better than remote storage
-because it is simpler to configure well and avoids communications overheads.
-With careful tuning it is sometimes possible to achieve acceptable performance
-using remote storage too. Benchmark your system with a realistic workload to
-determine the effects of any tuning parameters. If you cannot achieve the
-performance you expect, work with the vendor of your storage system to identify
-the problem.
-
 If your searches are CPU-bound, consider using a larger number of faster CPUs.

+[discrete]
+==== Local vs. remote storage
+
+include::./remote-storage.asciidoc[]
+
 [discrete]
 === Document modeling

--- a/docs/reference/how-to/shard-limits.asciidoc
+++ b/docs/reference/how-to/shard-limits.asciidoc
@ -0,0 +1,4 @@
+<<cluster-shard-limit,Cluster shard limits>> prevent creation of more than
+1000 non-frozen shards per node, and 3000 frozen shards per dedicated frozen
+node. Make sure you have enough nodes of each type in your cluster to handle
+the number of shards you need.
--- a/docs/reference/how-to/size-your-shards.asciidoc
+++ b/docs/reference/how-to/size-your-shards.asciidoc
@ -34,6 +34,9 @@ cluster sizing video]. As you test different shard configurations, use {kib}'s
 {kibana-ref}/elasticsearch-metrics.html[{es} monitoring tools] to track your
 cluster's stability and performance.

+The performance of an {es} node is often limited by the performance of the underlying storage. 
+Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and <<search-use-faster-hardware,search>>.
+
 The following sections provide some reminders and guidelines you should
 consider when designing your sharding strategy. If your cluster is already
 oversharded, see <<reduce-cluster-shard-count>>.
@ -225,10 +228,7 @@ GET _cat/shards?v=true
 [[shard-count-per-node-recommendation]]
 ==== Add enough nodes to stay within the cluster shard limits

-The <<cluster-shard-limit,cluster shard limits>> prevent creation of more than
-1000 non-frozen shards per node, and 3000 frozen shards per dedicated frozen
-node. Make sure you have enough nodes of each type in your cluster to handle
-the number of shards you need.
+include::./shard-limits.asciidoc[]

 [discrete]
 [[field-count-recommendation]]
--- a/docs/reference/modules/node.asciidoc
+++ b/docs/reference/modules/node.asciidoc
@ -1,5 +1,5 @@
 [[modules-node]]
-=== Node
+=== Nodes

 Any time that you start an instance of {es}, you are starting a _node_. A
 collection of connected nodes is called a <<modules-cluster,cluster>>. If you
@ -14,6 +14,10 @@ All nodes know about all the other nodes in the cluster and can forward client
 requests to the appropriate node.
 // end::modules-node-description-tag[]

+TIP: The performance of an {es} node is often limited by the performance of the underlying storage. 
+Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and 
+<<search-use-faster-hardware,search>>.
+
 [[node-roles]]
 ==== Node roles

@ -236,6 +240,8 @@ assign data nodes to specific tiers: `data_content`,`data_hot`, `data_warm`,

 If you want to include a node in all tiers, or if your cluster does not use multiple tiers, then you can use the generic `data` role.

+include::../how-to/shard-limits.asciidoc[]
+
 WARNING: If you assign a node to a specific tier using a specialized data role, then you shouldn't also assign it the generic `data` role. The generic `data` role takes precedence over specialized data roles.

 [[generic-data-node]]
@ -471,12 +477,6 @@ properly-configured remote block devices (e.g. a SAN) and remote filesystems
 storage. You can run multiple {es} nodes on the same filesystem, but each {es}
 node must have its own data path.

-The performance of an {es} cluster is often limited by the performance of the
-underlying storage, so you must ensure that your storage supports acceptable
-performance. Some remote storage performs very poorly, especially under the
-kind of load that {es} imposes, so make sure to benchmark your system carefully
-before committing to a particular storage architecture.
-
 TIP: When using the `.zip` or `.tar.gz` distributions, the `path.data` setting
 should be configured to locate the data directory outside the {es} home
 directory, so that the home directory can be deleted without deleting your data!