[DOCS] Address local vs. remote storage + shard limits feedback (#109360)

This commit is contained in:
shainaraskas 2024-06-12 13:50:23 -04:00 committed by GitHub
parent 47edae4fbd
commit 900eb82c99
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 40 additions and 26 deletions

View file

@ -22,6 +22,9 @@ mounted indices>> of <<ilm-searchable-snapshot,{search-snaps}>> exclusively.
This extends the storage capacity even further — by up to 20 times compared to
the warm tier.
TIP: The performance of an {es} node is often limited by the performance of the underlying storage.
Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and <<search-use-faster-hardware,search>>.
IMPORTANT: {es} generally expects nodes within a data tier to share the same
hardware profile. Variations not following this recommendation should be
carefully architected to avoid <<hotspotting,hot spotting>>.

View file

@ -94,6 +94,7 @@ auto-generated ids, Elasticsearch can skip this check, which makes indexing
faster.
[discrete]
[[indexing-use-faster-hardware]]
=== Use faster hardware
If indexing is I/O-bound, consider increasing the size of the filesystem cache
@ -110,13 +111,10 @@ different nodes so there's redundancy for any node failures. You can also use
<<snapshot-restore,snapshot and restore>> to backup the index for further
insurance.
Directly-attached (local) storage generally performs better than remote storage
because it is simpler to configure well and avoids communications overheads.
With careful tuning it is sometimes possible to achieve acceptable performance
using remote storage too. Benchmark your system with a realistic workload to
determine the effects of any tuning parameters. If you cannot achieve the
performance you expect, work with the vendor of your storage system to identify
the problem.
[discrete]
==== Local vs.remote storage
include::./remote-storage.asciidoc[]
[discrete]
=== Indexing buffer size

View file

@ -0,0 +1,11 @@
Directly-attached (local) storage generally performs
better than remote storage because it is simpler to configure well and avoids
communications overheads.
Some remote storage performs very poorly, especially
under the kind of load that {es} imposes. However, with careful tuning, it is
sometimes possible to achieve acceptable performance using remote storage too.
Before committing to a particular storage architecture, benchmark your system
with a realistic workload to determine the effects of any tuning parameters. If
you cannot achieve the performance you expect, work with the vendor of your
storage system to identify the problem.

View file

@ -38,6 +38,7 @@ for `/dev/nvme0n1`, specify `blockdev --setra 256 /dev/nvme0n1`.
// end::readahead[]
[discrete]
[[search-use-faster-hardware]]
=== Use faster hardware
If your searches are I/O-bound, consider increasing the size of the filesystem
@ -46,16 +47,13 @@ sequential and random reads across multiple files, and there may be many
searches running concurrently on each shard, so SSD drives tend to perform
better than spinning disks.
Directly-attached (local) storage generally performs better than remote storage
because it is simpler to configure well and avoids communications overheads.
With careful tuning it is sometimes possible to achieve acceptable performance
using remote storage too. Benchmark your system with a realistic workload to
determine the effects of any tuning parameters. If you cannot achieve the
performance you expect, work with the vendor of your storage system to identify
the problem.
If your searches are CPU-bound, consider using a larger number of faster CPUs.
[discrete]
==== Local vs. remote storage
include::./remote-storage.asciidoc[]
[discrete]
=== Document modeling

View file

@ -0,0 +1,4 @@
<<cluster-shard-limit,Cluster shard limits>> prevent creation of more than
1000 non-frozen shards per node, and 3000 frozen shards per dedicated frozen
node. Make sure you have enough nodes of each type in your cluster to handle
the number of shards you need.

View file

@ -34,6 +34,9 @@ cluster sizing video]. As you test different shard configurations, use {kib}'s
{kibana-ref}/elasticsearch-metrics.html[{es} monitoring tools] to track your
cluster's stability and performance.
The performance of an {es} node is often limited by the performance of the underlying storage.
Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and <<search-use-faster-hardware,search>>.
The following sections provide some reminders and guidelines you should
consider when designing your sharding strategy. If your cluster is already
oversharded, see <<reduce-cluster-shard-count>>.
@ -225,10 +228,7 @@ GET _cat/shards?v=true
[[shard-count-per-node-recommendation]]
==== Add enough nodes to stay within the cluster shard limits
The <<cluster-shard-limit,cluster shard limits>> prevent creation of more than
1000 non-frozen shards per node, and 3000 frozen shards per dedicated frozen
node. Make sure you have enough nodes of each type in your cluster to handle
the number of shards you need.
include::./shard-limits.asciidoc[]
[discrete]
[[field-count-recommendation]]

View file

@ -1,5 +1,5 @@
[[modules-node]]
=== Node
=== Nodes
Any time that you start an instance of {es}, you are starting a _node_. A
collection of connected nodes is called a <<modules-cluster,cluster>>. If you
@ -14,6 +14,10 @@ All nodes know about all the other nodes in the cluster and can forward client
requests to the appropriate node.
// end::modules-node-description-tag[]
TIP: The performance of an {es} node is often limited by the performance of the underlying storage.
Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and
<<search-use-faster-hardware,search>>.
[[node-roles]]
==== Node roles
@ -236,6 +240,8 @@ assign data nodes to specific tiers: `data_content`,`data_hot`, `data_warm`,
If you want to include a node in all tiers, or if your cluster does not use multiple tiers, then you can use the generic `data` role.
include::../how-to/shard-limits.asciidoc[]
WARNING: If you assign a node to a specific tier using a specialized data role, then you shouldn't also assign it the generic `data` role. The generic `data` role takes precedence over specialized data roles.
[[generic-data-node]]
@ -471,12 +477,6 @@ properly-configured remote block devices (e.g. a SAN) and remote filesystems
storage. You can run multiple {es} nodes on the same filesystem, but each {es}
node must have its own data path.
The performance of an {es} cluster is often limited by the performance of the
underlying storage, so you must ensure that your storage supports acceptable
performance. Some remote storage performs very poorly, especially under the
kind of load that {es} imposes, so make sure to benchmark your system carefully
before committing to a particular storage architecture.
TIP: When using the `.zip` or `.tar.gz` distributions, the `path.data` setting
should be configured to locate the data directory outside the {es} home
directory, so that the home directory can be deleted without deleting your data!