mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 23:27:25 -04:00
More balanced docs about NFS etc (#85060)
Today we don't really say anything about the requirements for the data path in terms of correctness, and we specifically say to avoid NFS for performance reasons. This isn't wholly accurate: some NFS implementations work just fine. This commit documents a more balanced position on local vs remote storage.
This commit is contained in:
parent
b0ab9394c7
commit
ff742fcb27
3 changed files with 41 additions and 21 deletions
|
@ -96,16 +96,11 @@ faster.
|
|||
[discrete]
|
||||
=== Use faster hardware
|
||||
|
||||
If indexing is I/O bound, you should investigate giving more memory to the
|
||||
filesystem cache (see above) or buying faster drives. In particular SSD drives
|
||||
are known to perform better than spinning disks. Always use local storage,
|
||||
remote filesystems such as `NFS` or `SMB` should be avoided. Also beware of
|
||||
virtualized storage such as Amazon's `Elastic Block Storage`. Virtualized
|
||||
storage works very well with Elasticsearch, and it is appealing since it is so
|
||||
fast and simple to set up, but it is also unfortunately inherently slower on an
|
||||
ongoing basis when compared to dedicated local storage. If you put an index on
|
||||
`EBS`, be sure to use provisioned IOPS otherwise operations could be quickly
|
||||
throttled.
|
||||
If indexing is I/O-bound, consider increasing the size of the filesystem cache
|
||||
(see above) or using faster storage. Elasticsearch generally creates individual
|
||||
files with sequential writes. However, indexing involves writing multiple files
|
||||
concurrently, and a mix of random and sequential reads too, so SSD drives tend
|
||||
to perform better than spinning disks.
|
||||
|
||||
Stripe your index across multiple SSDs by configuring a RAID 0 array. Remember
|
||||
that it will increase the risk of failure since the failure of any one SSD
|
||||
|
@ -115,6 +110,14 @@ different nodes so there's redundancy for any node failures. You can also use
|
|||
<<modules-snapshots,snapshot and restore>> to backup the index for further
|
||||
insurance.
|
||||
|
||||
Directly-attached (local) storage generally performs better than remote storage
|
||||
because it is simpler to configure well and avoids communications overheads.
|
||||
With careful tuning it is sometimes possible to achieve acceptable performance
|
||||
using remote storage too. Benchmark your system with a realistic workload to
|
||||
determine the effects of any tuning parameters. If you cannot achieve the
|
||||
performance you expect, work with the vendor of your storage system to identify
|
||||
the problem.
|
||||
|
||||
[discrete]
|
||||
=== Indexing buffer size
|
||||
|
||||
|
|
|
@ -12,18 +12,21 @@ index in physical memory.
|
|||
[discrete]
|
||||
=== Use faster hardware
|
||||
|
||||
If your search is I/O bound, you should investigate giving more memory to the
|
||||
filesystem cache (see above) or buying faster drives. In particular SSD drives
|
||||
are known to perform better than spinning disks. Always use local storage,
|
||||
remote filesystems such as `NFS` or `SMB` should be avoided. Also beware of
|
||||
virtualized storage such as Amazon's `Elastic Block Storage`. Virtualized
|
||||
storage works very well with Elasticsearch, and it is appealing since it is so
|
||||
fast and simple to set up, but it is also unfortunately inherently slower on an
|
||||
ongoing basis when compared to dedicated local storage. If you put an index on
|
||||
`EBS`, be sure to use provisioned IOPS otherwise operations could be quickly
|
||||
throttled.
|
||||
If your searches are I/O-bound, consider increasing the size of the filesystem
|
||||
cache (see above) or using faster storage. Each search involves a mix of
|
||||
sequential and random reads across multiple files, and there may be many
|
||||
searches running concurrently on each shard, so SSD drives tend to perform
|
||||
better than spinning disks.
|
||||
|
||||
If your search is CPU-bound, you should investigate buying faster CPUs.
|
||||
Directly-attached (local) storage generally performs better than remote storage
|
||||
because it is simpler to configure well and avoids communications overheads.
|
||||
With careful tuning it is sometimes possible to achieve acceptable performance
|
||||
using remote storage too. Benchmark your system with a realistic workload to
|
||||
determine the effects of any tuning parameters. If you cannot achieve the
|
||||
performance you expect, work with the vendor of your storage system to identify
|
||||
the problem.
|
||||
|
||||
If your searches are CPU-bound, consider using a larger number of faster CPUs.
|
||||
|
||||
[discrete]
|
||||
=== Document modeling
|
||||
|
|
|
@ -454,6 +454,20 @@ Like all node settings, it can also be specified on the command line as:
|
|||
./bin/elasticsearch -Epath.data=/var/elasticsearch/data
|
||||
----
|
||||
|
||||
The contents of the `path.data` directory must persist across restarts, because
|
||||
this is where your data is stored. {es} requires the filesystem to act as if it
|
||||
were backed by a local disk, but this means that it will work correctly on
|
||||
properly-configured remote block devices (e.g. a SAN) and remote filesystems
|
||||
(e.g. NFS) as long the remote storage behaves no differently from local
|
||||
storage. You can run multiple {es} nodes on the same filesystem, but each {es}
|
||||
node must have its own data path.
|
||||
|
||||
The performance of an {es} cluster is often limited by the performance of the
|
||||
underlying storage, so you must ensure that your storage supports acceptable
|
||||
performance. Some remote storage performs very poorly, especially under the
|
||||
kind of load that {es} imposes, so make sure to benchmark your system carefully
|
||||
before committing to a particular storage architecture.
|
||||
|
||||
TIP: When using the `.zip` or `.tar.gz` distributions, the `path.data` setting
|
||||
should be configured to locate the data directory outside the {es} home
|
||||
directory, so that the home directory can be deleted without deleting your data!
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue