Commit graph

898 commits

Author SHA1 Message Date
David Turner
ffa8224db7
Add note about incompleteness of CBs (#116176) (#116187)
The docs kinda imply that circuit breakers protect against OOMEs, at
least that's how some customers seem to interpret them. This commit adds
a note spelling out that this isn't the case.
2024-11-05 03:35:10 +11:00
shainaraskas
551011c7ab
[DOCS] Clarify behavior of the generic data node role (#106375) (#109692)
(cherry picked from commit 82d7e4ec93)
2024-06-13 15:05:01 -04:00
David Turner
10ca010c44 Allocation awareness allocates some replicas (#104800)
The docs for forced awareness indicate that no replicas will be assigned
until all zones are available, which is definitely undesirable and also
not the actual behaviour. This commit fixes the wording to match what
really happens.

Closes #104777
2024-01-29 08:19:04 +00:00
Sylvain Wallez
040c454f0b
Fixes CORS headers needed by Elastic clients (#85791) (#93659)
* Fixes CORS headers needed by Elastic clients

Updates the default value for the `http.cors.allow-headers`
setting to include headers used by Elastic client libraries.

Also adds the `access-control-expose-headers` header to responses to
CORS requests so that clients can successfully perform their product
check.

(cherry picked from commit 484d3f4ada)
2023-02-09 18:42:58 +01:00
David Turner
b611fe4bb5 More opinionated docs about http.max_content_length (#90500)
Adds to the docs a note that the `100mb` default for
`http.max_content_length` is the recommended maximum, along with
suggestions for what to do when hitting this limit.
2022-09-29 16:11:39 +01:00
David Turner
7b5ccbb1ac
Weaken language about "low-latency" networks (#89198) (#89201)
Today we say that voting-only nodes require a "low-latency" network.
This term has a specific meaning in some operating environments which is
different from our intended meaning. To avoid this confusion this commit
removes the absolute term "low-latency" in favour of describing the
requirements relative to the user's own performance goals.
2022-08-09 22:01:45 +09:30
David Turner
e7ffa050ed
More docs re. removing cluster.initial_master_nodes (#85948) (#85981)
Ensures that on every page of the docs that mentions
`cluster.initial_master_nodes` also mentions that this setting must be
removed after bootstrapping completes.
2022-04-19 03:13:32 -04:00
debadair
1ed056d58b
[DOCS] Reuse data tier content in node role docs (#84346) (#85419)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-03-28 16:47:40 -07:00
James Rodewig
53ed187d63
[DOCS] Fix typos (#83895) (#83975)
Co-authored-by: Tobias Stadler <ts.stadler@gmx.de>
2022-02-15 13:05:01 -05:00
James Rodewig
2cc8a583ff Fix table headings in 7.17 2022-02-15 09:13:50 -05:00
Tanguy Leroux
07b995152e
[7.17.1] Adjust indices.recovery.max_bytes_per_sec according to external settings (#83413)
* Adjust indices.recovery.max_bytes_per_sec according to external settings

Today the setting indices.recovery.max_bytes_per_sec defaults to different
values depending on the node roles, the JVM version and the system total
memory that can be detected.

The current logic to set the default value can be summarized as:

    40 MB for non-data nodes
    40 MB for data nodes that runs on a JVM version < 14
    40 MB for data nodes that have one of the data_hot, data_warm, data_content or data roles

Nodes with only data_cold and/or data_frozen roles as data roles have a
default value that depends of the available memory:

    with ≤ 4 GB of available memory, the default is 40 MB
    with more than 4 GB and less or equal to 8 GB, the default is 60 MB
    with more than 8 GB and less or equal to 16 GB, the default is 90 MB
    with more than 16 GB and less or equal to 32 GB, the default is 125 MB
    and above 32 GB, the default is 250 MB

While those defaults served us well, we want to evaluate if we can define
more appropriate defaults if Elasticsearch were to know better the limits
(or properties) of the hardware it is running on - something that Elasticsearch
cannot extract by itself but can derive from settings that are provided at startup.

This pull request introduces the following new node settings:

    node.bandwidth.recovery.network
    node.bandwidth.recovery.disk.read
    node.bandwidth.recovery.disk.write

Those settings are not dynamic and must be set before the node starts.
When they are set Elasticsearch detects the minimum available bandwidth
among the network, disk read and disk write available bandwidths and computes
a maximum bytes per seconds limit that will be a fraction of the min. available
bandwidth. By default 40% of the min. bandwidth is used but that can be
dynamically configured by an operator
(using the node.bandwidth.recovery.operator.factor setting) or by the user
directly (using a different setting node.bandwidth.recovery.factor).

The limit computed from available bandwidths is then compared to pre existing
limitations like the one set through the indices.recovery.max_bytes_per_sec setting
or the one that is computed by Elasticsearch from the node's physical memory
on dedicated cold/frozen nodes. Elasticsearch will try to use the highest possible
limit among those values, while not exceeding an overcommit ratio that is also
defined through a node setting
(see node.bandwidth.recovery.operator.factor.max_overcommit).

This overcommit ratio is here to prevent the rate limit to be set to a value that is
greater than 100 times (by default) the minimum available bandwidth.

Backport of #82819 for 7.17.1

* Add missing max overcommit factor to list of (dynamic) settings (#83350)

The setting node.bandwidth.recovery.operator.factor.max_overcommit
wasn't added to the list of cluster settings and to the list of settings to
consume for updates.

Relates #82819

* Operator factor settings should have the OperatorDynamic setting property (#83359)

Relates #82819

* Add docs for node bandwith settings (#83361)

Relates #82819

* Adjust for 7.17.1

* remove draft

* remove docs/changelog/83350.yaml

Co-authored-by: David Turner <david.turner@elastic.co>
2022-02-09 06:33:05 -05:00
James Rodewig
146110fb67
[DOCS] Fix typo (#82100) (#82554)
Fix typo under `indices.recovery.max_concurrent_snapshot_file_downloads_per_node`

(cherry picked from commit 58ffc42f5f)

Co-authored-by: erictung1999 <41339955+erictung1999@users.noreply.github.com>
2022-01-13 09:54:03 -05:00
James Rodewig
2bca511d3c
[DOCS] Correct yaml syntax in example configuration (#82297) (#82394)
(cherry picked from commit 432fd79c46)

Co-authored-by: mymindstorm <mymindstorm@evermiss.net>
2022-01-10 17:20:07 -05:00
James Rodewig
afb07c446c
[DOCS] Add prerequisites for CCS (#81782) (#82368)
* Adds a prerequisites section covering remote cluster config, node roles, and security.
* Moves existing content about remote cluster config to the prereqs.
* Updates the remote cluster docs to include information about eligible gateway nodes and tagging for gateway nodes.

Closes https://github.com/elastic/elasticsearch/issues/72001

(cherry picked from commit 7142b47e69)
2022-01-10 09:31:32 -05:00
James Rodewig
b94e3a9aa0
[DOCS] Thread pool settings are static (#81887) (#81947)
Starting in 5.1 Thread Pools can no longer be dynamically updated, [doc](https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking_50_settings_changes.html#_threadpool_settings).

Co-authored-by: Stef Nestor <steffanie.nestor@gmail.com>
2021-12-20 11:32:26 -05:00
James Rodewig
ed4b3213a5
[7.17] [DOCS] Update remote cluster version compatibility table for 7.17 (#81239) (#81826)
Updates the remote clusters version compatibility table to include 7.17.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
2021-12-16 13:22:36 -05:00
Joe Gallo
1b8c867785
_tier_preference docs changes for 7.16 (#81401) 2021-12-07 11:03:41 -05:00
David Turner
71a5b00074
Expand warning about modifying data path contents (#79649) (#79656)
Today we have a short note in one place in the docs saying not to touch
the contents of the data path. This commit expands the warning to
describe more precisely what is forbidden, and to give some more detail
of the consequences, and also duplicates the warning to the other
location that documents the `path.data` setting.
2021-10-21 16:39:57 -04:00
Stuart Tettemer
005d8f0106
Script: Deprecate script context cache (#79508)
* Script: Deprecate script context cache

Deprecate the script context cache in favor of the general cache.

Users should use the following settings:
`script.max_compilations_rate` to set the max compilation rate
  for user scripts such as filter scripts.  Certain script contexts
  that submit scripts outside of the control of the user are
  exempted from this rate limit.  Examples include runtime fields,
  ingest and watcher.

`script.cache.max_size` to set the max size of the cache.

`script.cache.expire` to set the expiration time for entries in
the cache.

Whats deprecated?
`script.max_compilations_rate: use-context`.  This special
setting value was used to turn on the script context-specific caches.

`script.context.$CONTEXT.cache_max_size`, use `script.cache.max_size`
instead.

`script.context.$CONTEXT.cache_expire`, use `script.cache.expire`
instead.

`script.context.$CONTEXT.max_compilations_rate`, use
`script.max_compilations_rate` instead.

The default cache size was increased from `100` to `3000`, which
was approximately the max cache size when using context-specific caches.

The default compilation rate limit was increased from `75/5m` to
`150/5m` to account for increasing uses of scripts.

Refs: #62899
2021-10-19 20:15:45 -05:00
Francisco Fernández Castaño
3bd8055370
Limit concurrent snapshot file restores in recovery per node (#79379)
Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.

This commit adds a new setting
`indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that
allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot.

Closes #79044
Backport of #79316
2021-10-19 09:08:54 +02:00
Yannick Welsch
88ed45c3bb
Node level can match action (#78765) (#79344)
Changes can-match from a shard-level to a node-level action, which helps avoid an explosion of shard-level can-match
subrequests in clusters with many shards, that can cause stability issues. Also introduces a new search_coordination
thread pool to handle the sending and handling of node-level can-match requests.
2021-10-18 12:59:58 +02:00
Nikola Grcevski
8512037aaa
[7.x] Deprecation of transient cluster settings (#78794) (#79288)
This PR changes uses of transient cluster settings to
persistent cluster settings.

The PR also deprecates the transient settings usage.

Relates to #49540
2021-10-15 19:06:33 -04:00
Adam Locke
0b30aebcd7
A typo error (#78987) (#79204)
* A typo error

a space between 'E' and 'cluster...'

* Update example, fix headings, change notes

Co-authored-by: Adam Locke <adam.locke@elastic.co>

Co-authored-by: Marwane Chahoud <marwane.chahoud@gmail.com>
2021-10-15 08:52:13 -04:00
Henning Andersen
3eadef4774
Make disk.threshold_enabled operator only (#78822) (#79217)
Orchestrated environments should not allow users to override
`cluster.routing.allocation.disk.threshold_enabled`, so making this
operator only.

Closes #77846

Co-authored-by: David Turner <david.turner@elastic.co>
2021-10-15 11:01:00 +02:00
Henning Andersen
6045345544
Indexing_data/l4 is now generally available (#78595)
Remove the ESS only marking for indexing_data and lz4 options.

Relates #77130
2021-10-14 09:26:07 +02:00
Adam Locke
5b86e90cdf
[DOCS] Fix default value for closed indices (#78924) (#79055)
* [DOCS] Fix default value for closed indices

#57953 introduced changes that added ESS icons to many Elasticsearch settings. As part of those changes, the default value for `cluster.indices.close.enable` was indicated as `false`, when it should be `true`. This PR updates the default value to `true`. 

Closes #78877

* Update description

* Update note to remove outdated claims
2021-10-13 08:26:32 -04:00
James Rodewig
0a35034c37
[DOCS] Update ESS support for stack.templates.enabled (#78732) (#78758)
The documentation indicates that `stack.templates.enabled` can be used in Elasticsearch Service, but it is not part of the settings allowlist in ESS. This PR makes the documentation match the state of the allowlist.

Co-authored-by: Samuel Nelson <samuel.nelson@elastic.co>
2021-10-06 09:49:57 -04:00
David Turner
2ff32f97ca
Improve docs for pre-release version compatibility (#78428) (#78432)
* Improve docs for pre-release version compatibility

Follow-up to #78317 clarifying a couple of points:

- a pre-release build can restore snapshots from released builds
- compatibility applies if at least one of the local or remote cluster
  is a released build

* Remote cluster build date nit
2021-09-29 04:59:50 -04:00
David Turner
e4718fe7f3
Add docs for pre-release version compatibility (#78317) (#78331)
The reference manual includes docs on version compatibility in various
places, but it's not clear that these docs only apply to released
versions and that the rules for pre-release versions are stricter than
folks expect. This commit adds some words to the docs for unreleased
versions which explains this subtlety.
2021-09-27 12:06:51 -04:00
Adam Locke
2174b4642d
[DOCS] Update remote cluster docs (#77043) (#78212)
* [DOCS] Update remote cluster docs

* Add files, rename files, write new stuff

* Plethora of changes

* Add test and update snippets

* Redirects, moved files, and test updates

* Moved file to x-pack for tests

* Remove older CCS page and add redirects

* Cleanup, link updates, and some rewrites

* Update image

* Incorporating user feedback and rewriting much of the remote clusters page

* More changes from review feedback

* Numerous updates, including request examples for CCS and Kibana

* More changes from review feedback

* Minor clarifications on security for remote clusters

* Incorporate review feedback

Co-authored-by: Yang Wang <ywangd@gmail.com>

* Some review feedback and some editorial changes

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
# Conflicts:
#	docs/reference/modules/network.asciidoc
#	docs/reference/modules/remote-clusters.asciidoc
#	x-pack/docs/en/security/ccs-clients-integrations/cross-cluster.asciidoc
#	x-pack/docs/en/security/ccs-clients-integrations/index.asciidoc

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2021-09-23 12:13:03 +02:00
James Rodewig
4d83194e18
[DOCS] Fix typo for script.painless.regex.enabled setting value (#77853) (#77924)
The value is `limited`, not `limit`.

Co-authored-by: AndyHunt66 <andrew.hunt@elastic.co>
2021-09-16 15:53:14 -04:00
James Rodewig
6573f79ad4
[DOCS] Fix typo in setting custom attributes when starting a node (#77617) (#77629)
Removes an excess "`".

Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
2021-09-13 10:02:44 -04:00
Johan Nilsson Hansen
c1fcf8917b Create a sha-256 hash of the shard request cache key (#74877)
We currently use the plaintext body of a shard request as the key to the 
request cache.  This has the disadvantage that very large requests can
quickly fill up the cache due to the size of their keys.  With this commit, 
we instead use a sha-256 hash of the shard request as the cache key, 
which will use a constant (and much smaller) number of bytes.
2021-09-13 14:08:00 +01:00
David Turner
f8f3420f74 Limit count of HTTP channels with tracked stats (#77303)
Today we expire the client stats for HTTP channels 5 minutes after they
close. It's possible to open a very large number of HTTP channels in 5
minutes, possibly inadvertently, and the stats for those channels can be
overwhelming.

This commit introduces a limit on the number of channels tracked by each
node which applies in addition to the age limit, and makes these limits
configurable via static settings. It drops the pruning of old stats when
starting to track a new channel and instead uses a queue to expire the
oldest stats when each channel closes if necessary to respect the count
limit; it only performs age-based expiry when retrieving the stats,
since the count limit now bounds the memory needed. Finally, it
tightents up some missing synchronization and makes sure that we expose
only immutable objects to the stats subsystem.
2021-09-08 07:42:34 +01:00
Artem Prigoda
609bc8237f
[DOCS] Fix bullet list layout for indices.recovery.use_snapshots (#77296)
Add a missed `+` which got lost during backporting.
2021-09-06 10:12:42 +02:00
Henning Andersen
62699ed810
Indexing_data/lz4, recover from snapshot ESS only (#77130)
Compression using indexing_data or lz4 as well as recovery from snapshot
are primarily intended for ESS and is therefore marked ESS only in docs.

Relates #76237 and #74587
2021-09-01 18:21:57 +02:00
James Rodewig
137bad2165
[DOCS] Fix formatting for snapshot_meta thread pool (#76973) (#76982)
Co-authored-by: Howard <danielhuang@tencent.com>
2021-08-26 10:47:56 -04:00
Martijn van Groningen
0a9649ec0a
Improve fault-detection.asciidoc (#76826)
Backport of #76821 to 7.x branch.

Add section to fault-detection.asciidoc about nodes being removed from cluster
due to slow cluster state applying.
2021-08-23 14:43:33 +02:00
Tim Brooks
90155a741f
Remote compression scheme default to deflate (#76580)
Currently the cluster.remote.*.transport.compression_scheme setting
defaults to the transport.compression_scheme value. This commit
modifies this to default to deflate (the existing compression scheme
prior to 7.14) when cluster.remote.*.transport.compress is explicitly set.
This will ensure that users do not accidentally change their compression
scheme for 7.x.
2021-08-16 16:41:04 -06:00
Tim Brooks
f52ca3c60b
Respond with same compression scheme received (#76514)
This is related to #73497. Currently, we only use the configured
transport.compression_scheme setting when compressing a request or a
response. Additionally, the cluster.remote.*.compression_scheme
setting is ignored. This commit fixes this behavior by respecting the
per-cluster setting. Additionally, it resolves confusion around inbound
and outbound connections by always responding with the same scheme that
was received. This allows remote connections to have different schemes
than local connections.
2021-08-14 15:21:18 -06:00
Tim Brooks
7e4fc3d8e1
Add docs for production ready compression settings (#76504)
In 7.15, we intend for the indexing_data compression level and the
compression scheme lz4 to no longer be experimental. This commit
updates the documentation to reflect this. Additionally, it adds
missing docs for the cluster.remote.*.transport.compression_scheme
setting.

Relates to #73497.
2021-08-13 13:28:56 -06:00
Francisco Fernández Castaño
8cc9a4af92
[7.x] Add peer recoveries using snapshot files when possible (#76482)
This commit adds peer recoveries from snapshots. It allows establishing a replica by downloading file data from a snapshot rather than transferring the data from the primary.

Enabling this feature is done on the repository definition. Repositories having the setting `use_for_peer_recovery=true` will be consulted to find a good snapshot when recovering a shard.

Relates #73496
Backport of #76237
2021-08-13 15:10:26 +02:00
David Turner
8e5ff21eb2 Add note on special network values docs (#75779)
The special values `_global_`, `_site_`, `0.0.0.0` and so on may resolve
to multiple addresses, of which one is chosen to be the publish address.
This commit generalises the warning about reachability as applied to
DNS-resolved hostnames to also apply to these special values.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-08-09 17:06:13 +01:00
Francisco Fernández Castaño
8474d93b54
Add peer recovery planners that take into account available snapshots (#76239)
This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot. It uses repositories with `use_for_peer_recovery`
setting set to true.

It adds a new recovery setting `indices.recovery.use_snapshots`

Relates #73496
Backport of #75840
2021-08-09 15:41:02 +02:00
elasticsearchmachine
5afcf928da
[DOCS] Document regex circuit breaker (#76048) (#76131)
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to #63029.

Closes #75199.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-08-04 16:49:22 -04:00
David Turner
aca4a092a1 Advise away from a ping schedule on remote connxns (#75513)
Today the docs for remote cluster connections use `ping_schedule` fairly
liberally, and don't mention that you should prefer TCP keepalives
wherever possible. This commit reduces the use of this setting in the
examples and adjusts the description of the setting to include a note
about TCP keepalives instead.
2021-07-20 19:09:33 +01:00
James Rodewig
f97d65771f
[DOCS] Note required node roles and data tiers (#74566) (#75048)
Closes #74528 and #74565
2021-07-07 10:06:42 -04:00
Lisa Cawley
ca7999d458
[DOCS] Clean up xpack.ml.enabled details (#74573) (#74776) 2021-06-30 10:22:13 -07:00
Tim Brooks
1ae256f424
Add additional transport compression options (#74719)
This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
2021-06-29 16:13:08 -06:00
Armin Braun
d64a72c127
Snapshot Pagination and Scalability Improvements Backport to 7.x (#74676)
Backport of the recently introduced snapshot pagination and scalability improvements listed below.
Merged as a single backport because the `7.x` and master snapshot status API logic had massively diverged between master and 7.x. With the work in the below PRs, the logic in master and 7.x once again has been aligned very closely again.

#72842
#73172
#73199
#73570 
#73952
#74236 
#74451 (this one is only partly applicable as it was mainly a change to master to align `master` and `7.x` branches)
2021-06-29 15:16:26 +02:00