Commit graph

912 commits

Author SHA1 Message Date
Iraklis Psaroudakis
f284cc16f4
Convert disk watermarks to RelativeByteSizeValues (#88719)
* Convert disk watermarks to RelativeByteSizeValues

Similar to the existing watermark setting for the frozen tier.

Pre-requisite for PR 88639 that plans to introduce max headroom
settings for the disk watermarks, similar to the frozen tier max
headroom setting.

* Add changelog

* Revert 20gb to 20GB

* Make formatNoTrailingZerosPercent non static

* ByteSizeValue.MINUS_ONE

* Remove getMinimumTotalSizeForBelowWatermark

* Remove comment

* Fix minor stuff

* Make parsing of RelativeByteSizeValue faster

Mimicks older definitelyNotPercentage function

* Remove Locale from Strings.format

* More MINUS_ONE
2022-07-22 18:39:07 +03:00
Leaf-Lin
945cb27782
[DOCS] Adding discovery troubleshooting link in the master get help page (#87344)
* Adding discovery troubleshooting link

* Add tags to pull in discovery troubleshooting content

* Move discovery troubleshooting to separate page and add redirects

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-07-06 15:51:43 -04:00
Iraklis Psaroudakis
50d2cf31b8
Periodic warning for 1-node cluster w/ seed hosts (#88013)
For fully-formed single-node clusters, emit a periodic warning if seed_hosts has been set to a non-empty list.

Closes #85222
2022-06-30 16:35:15 +03:00
David Turner
80f7af58f8
More detail in discovery troubleshooting docs (#86930)
In #85074 we added docs on discovery troubleshooting that really only
talked about troubleshooting master elections. There's also the case
where the master is elected fine but some other node can't join it. This
commit adds troubleshooting docs about that too.

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-06-06 08:33:45 +01:00
Pooya Salehi
beadcaf631
Increase force_merge threadpool size (#87082)
Changes the default size used for the force_merge threadpool to 1/8 of
the allocated processors, with a minimum value of 1.

Closes #84943
2022-05-25 15:45:28 +02:00
Joe Gallo
79990fa49b
Remove "Push back excessive requests for stats (#83832)" (#87054) 2022-05-23 12:58:02 -04:00
David Turner
79f181d208
Reduce resource needs of join validation (#85380)
Fixes a few scalability issues around join validation:

- compresses the cluster state sent over the wire
- shares the serialized cluster state across multiple nodes
- forks the decompression/deserialization work off the transport thread

Relates #77466
Closes #83204
2022-04-26 12:15:54 +01:00
David Turner
33a553f61f Fix up whitespace error introduced in #85948 2022-04-19 07:58:10 +01:00
David Turner
ce004d49e7
More docs re. removing cluster.initial_master_nodes (#85948)
Ensures that on every page of the docs that mentions
`cluster.initial_master_nodes` also mentions that this setting must be
removed after bootstrapping completes.
2022-04-19 07:54:43 +01:00
David Turner
6a273886e9
Add technical docs on diagnosing instability etc (#85074)
Copies some internal troubleshooting docs to the reference manual for
wider use.

Co-authored-by: James Rodewig <james.rodewig@gmail.com>
2022-03-31 09:01:10 +01:00
James Rodewig
73e56e3cf8
[DOCS] Reuse data tier content in node role docs (#84346)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-03-28 14:32:01 -07:00
David Turner
fd76f9c5d1
Fix auto-bootstrap docs (#85215)
Today it's no longer true that by default nodes will auto-discover other
nodes on the same host and bootstrap them all into a cluster. This
commit fixes the docs on auto-bootstrapping to recognise this.
2022-03-22 16:35:48 +00:00
David Turner
ff742fcb27
More balanced docs about NFS etc (#85060)
Today we don't really say anything about the requirements for the data
path in terms of correctness, and we specifically say to avoid NFS for
performance reasons. This isn't wholly accurate: some NFS
implementations work just fine. This commit documents a more balanced
position on local vs remote storage.
2022-03-18 13:01:59 +00:00
Mary Gouseti
ed0bb2a8af
Push back excessive requests for stats (#83832)
Resolves #51992
2022-02-28 08:46:18 +01:00
Tobias Stadler
e3deacf547
[DOCS] Fix typos (#83895) 2022-02-15 12:42:17 -05:00
James Rodewig
2f03112b5b
[DOCS] Synced with 8.0 stack upgrade changes (#83489) (#83596)
This moves the bulk of the upgrade information into the consolidated upgrade guide, but leaves the primary upgrade topic in place as a cross reference.

Relates to: https://github.com/elastic/stack-docs/pull/1970

Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
(cherry picked from commit f6473d71f9)

Co-authored-by: debadair <debadair@elastic.co>
2022-02-07 11:01:42 -05:00
Tanguy Leroux
7f827bbab8
Document and test operator-only node bandwidth recovery settings (#83372)
This commit updates the Operator-only functionality doc to 
mention the operator only settings introduced in #82819.

It also adds an integration test for those operator only 
settings that would have caught #83359.
2022-02-02 11:50:19 +01:00
David Turner
a062bdf42f
Add docs for node bandwith settings (#83361)
Relates #82819
2022-02-01 12:19:00 +00:00
erictung1999
58ffc42f5f
[DOCS] Fix typo (#82100)
Fix typo under `indices.recovery.max_concurrent_snapshot_file_downloads_per_node`
2022-01-13 09:42:22 -05:00
James Rodewig
dfb9f6f18d
[DOCS] Document 8.0 BWC support for CCS (#80809)
As of 8.0, the compatibility window for cross-cluster search (CCS) to an earlier release will be one minor release. This updates the CCS docs and adds a related 8.0 breaking change.

Closes https://github.com/elastic/elasticsearch/issues/80782
2022-01-11 10:33:12 -05:00
James Rodewig
950eb775fe
[DOCS] Correct yaml syntax in example configuration (#82297) (#82392)
(cherry picked from commit 432fd79c46)

Co-authored-by: mymindstorm <mymindstorm@evermiss.net>
2022-01-10 17:19:27 -05:00
James Rodewig
7142b47e69
[DOCS] Add prerequisites for CCS (#81782)
* Adds a prerequisites section covering remote cluster config, node roles, and security.
* Moves existing content about remote cluster config to the prereqs.
* Updates the remote cluster docs to include information about eligible gateway nodes and tagging for gateway nodes.

Closes https://github.com/elastic/elasticsearch/issues/72001
2022-01-10 09:17:44 -05:00
Stef Nestor
e2d66cd257
[DOCS] Thread pool settings are static (#81887)
Starting in 5.1 Thread Pools can no longer be dynamically updated, [doc](https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking_50_settings_changes.html#_threadpool_settings).
2021-12-20 11:20:06 -05:00
Leaf-Lin
82592c4268
[DOCS] Update remote cluster version compatibility table for 8.x (#81239)
Updates the remote clusters version compatibility table to include 7.17 and 8.x versions.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-12-16 11:16:24 -05:00
David Turner
5b9ce9e820
Remove dead code from same-shard decider (#81520)
Today the same-shard allocation decider falls back to checking the
hostname if the node has no host address. In practice nodes will always
have an address so the fallback is dead code. This commit removes that
dead code.

Relates #80702 which will add the ability to distinguish nodes by
hostname regardless of whether they have an address or not, and #80767
which optimizes this area of code - this refactoring should make the
optimization simpler.
2021-12-09 08:42:25 +00:00
David Turner
7dd32fb027
Reduce verbosity-increase timeout to 3m (#81118)
Today we increase the verbosity of discovery failures after 5 minutes
without a master. Unfortunately 5 minutes is a common orchestration
timeout, so if discovery is broken then we see nodes being shut down
just before they start to emit useful logs. This commit reduces the
default timeout to 3 minutes to address that.
2021-11-30 09:52:39 +00:00
David Turner
8cf4c7b6fb
Remove last few mentions of Zen discovery (#80410)
We have a few leftover mentions of `zen` discovery, mostly for
historical/BwC reasons, which this commit removes.

Prior to this commit the default value for `discovery.type` was `zen`
but this was not written down anywhere or officially supported: the two
options were to set it to `single-node` or to omit it entirely. This
commit changes the default to `multi-node` and documents this.

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2021-11-09 09:52:06 +01:00
Stuart Tettemer
30e15ba838
Script: Time series compile and cache evict metrics (#79078)
Collects compilation and cache eviction metrics for
each script context.

Metrics are available in _nodes/stats in 5m/15m/1d
buckets.

Refs: #62899
2021-11-03 13:13:42 -05:00
David Turner
6cc0a41af0
Expand warning about modifying data path contents (#79649)
Today we have a short note in one place in the docs saying not to touch
the contents of the data path. This commit expands the warning to
describe more precisely what is forbidden, and to give some more detail
of the consequences, and also duplicates the warning to the other
location that documents the `path.data` setting.
2021-10-21 16:28:43 -04:00
Stuart Tettemer
808b70d2f9
Script: Restore the scripting general cache (#79453)
Deprecate the script context cache in favor of the general cache.

Users should use the following settings:
`script.max_compilations_rate` to set the max compilation rate
  for user scripts such as filter scripts.  Certain script contexts
  that submit scripts outside of the control of the user are
  exempted from this rate limit.  Examples include runtime fields,
  ingest and watcher.

`script.cache.max_size` to set the max size of the cache.

`script.cache.expire` to set the expiration time for entries in
the cache.

Whats deprecated?
`script.max_compilations_rate: use-context`.  This special
setting value was used to turn on the script context-specific caches.

`script.context.$CONTEXT.cache_max_size`, use `script.cache.max_size`
instead.

`script.context.$CONTEXT.cache_expire`, use `script.cache.expire`
instead.

`script.context.$CONTEXT.max_compilations_rate`, use
`script.max_compilations_rate` instead.

The default cache size was increased from `100` to `3000`, which
was approximately the max cache size when using context-specific caches.

The default compilation rate limit was increased from `75/5m` to
`150/5m` to account for increasing uses of scripts.

System script contexts can now opt-out of compilation rate limiting
using a flag rather than a sentinel rate limit value.

7.16: Script: Deprecate script context cache #79508
Refs: #62899

7.16: Script: Opt-out system contexts from script compilation rate limit #79459
Refs: #62899
2021-10-21 07:57:27 -05:00
Francisco Fernández Castaño
2b4fe8fc7b
Limit concurrent snapshot file restores in recovery per node (#79316)
Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.

This commit adds a new setting
`indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that
allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot.

Closes #79044
2021-10-18 18:17:27 +02:00
Yannick Welsch
13487b1ed6
Node level can match action (#78765)
Changes can-match from a shard-level to a node-level action, which helps avoid an explosion of shard-level can-match
subrequests in clusters with many shards, that can cause stability issues. Also introduces a new search_coordination
thread pool to handle the sending and handling of node-level can-match requests.
2021-10-18 10:13:44 +02:00
Nikola Grcevski
055c770083
Deprecation of transient cluster settings (#78794)
This PR changes uses of transient cluster settings to
persistent cluster settings. 

The PR also deprecates the transient settings usage.

Relates to #49540
2021-10-15 13:00:52 -04:00
Henning Andersen
57e503ca78
[DOCS] disk.threshold_enabled not cloud (#79225)
Mark `cluster.routing.allocation.disk.threshold_enabled` not for cloud
and add it to list of operator only settings.

Relates #78822
2021-10-15 16:19:04 +02:00
Adam Locke
529986e9b1
A typo error (#78987) (#79203)
* A typo error

a space between 'E' and 'cluster...'

* Update example, fix headings, change notes

Co-authored-by: Adam Locke <adam.locke@elastic.co>

Co-authored-by: Marwane Chahoud <marwane.chahoud@gmail.com>
2021-10-15 08:52:03 -04:00
Adam Locke
c3b67ee0ae
[DOCS] Fix default value for closed indices (#78924)
* [DOCS] Fix default value for closed indices

#57953 introduced changes that added ESS icons to many Elasticsearch settings. As part of those changes, the default value for `cluster.indices.close.enable` was indicated as `false`, when it should be `true`. This PR updates the default value to `true`. 

Closes #78877

* Update description

* Update note to remove outdated claims
2021-10-13 08:14:01 -04:00
Samuel Nelson
c4f5d41fe7
[DOCS] Update ESS support for stack.templates.enabled (#78732)
The documentation indicates that `stack.templates.enabled` can be used in Elasticsearch Service, but it is not part of the settings allowlist in ESS. This PR makes the documentation match the state of the allowlist.
2021-10-06 09:37:30 -04:00
David Turner
07a2acac93
Improve docs for pre-release version compatibility (#78428)
* Improve docs for pre-release version compatibility

Follow-up to #78317 clarifying a couple of points:

- a pre-release build can restore snapshots from released builds
- compatibility applies if at least one of the local or remote cluster
  is a released build

* Remote cluster build date nit
2021-09-29 04:49:07 -04:00
David Turner
4782cf4d91
Add docs for pre-release version compatibility (#78317)
The reference manual includes docs on version compatibility in various
places, but it's not clear that these docs only apply to released
versions and that the rules for pre-release versions are stricter than
folks expect. This commit adds some words to the docs for unreleased
versions which explains this subtlety.
2021-09-27 16:56:35 +01:00
Adam Locke
6940673e8a
[DOCS] Update remote cluster docs (#77043)
* [DOCS] Update remote cluster docs

* Add files, rename files, write new stuff

* Plethora of changes

* Add test and update snippets

* Redirects, moved files, and test updates

* Moved file to x-pack for tests

* Remove older CCS page and add redirects

* Cleanup, link updates, and some rewrites

* Update image

* Incorporating user feedback and rewriting much of the remote clusters page

* More changes from review feedback

* Numerous updates, including request examples for CCS and Kibana

* More changes from review feedback

* Minor clarifications on security for remote clusters

* Incorporate review feedback

Co-authored-by: Yang Wang <ywangd@gmail.com>

* Some review feedback and some editorial changes

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
2021-09-22 16:02:33 -04:00
James Rodewig
2b2f0e1d7f
[DOCS] Remove the listener thread pool (#78194)
Changes:
* Removes docs for the `listener` thread pool
* Adds an 8.0 breaking change for the thread pool removal

Relates to #53314 and #53049
2021-09-22 13:41:05 -04:00
AndyHunt66
a5030ef407
[DOCS] Fix typo for script.painless.regex.enabled setting value (#77853)
The value is `limited`, not `limit`.
2021-09-16 13:59:58 -04:00
Johan Nilsson Hansen
553e8dcb07
Create a sha-256 hash of the shard request cache key (#74877)
We currently use the plaintext body of a shard request as the key to the 
request cache.  This has the disadvantage that very large requests can
quickly fill up the cache due to the size of their keys.  With this commit, 
we instead use a sha-256 hash of the shard request as the cache key, 
which will use a constant (and much smaller) number of bytes.
2021-09-13 08:55:59 +01:00
David Turner
1045abe71f
Limit count of HTTP channels with tracked stats (#77303)
Today we expire the client stats for HTTP channels 5 minutes after they
close. It's possible to open a very large number of HTTP channels in 5
minutes, possibly inadvertently, and the stats for those channels can be
overwhelming.

This commit introduces a limit on the number of channels tracked by each
node which applies in addition to the age limit, and makes these limits
configurable via static settings. It drops the pruning of old stats when
starting to track a new channel and instead uses a queue to expire the
oldest stats when each channel closes if necessary to respect the count
limit; it only performs age-based expiry when retrieving the stats,
since the count limit now bounds the memory needed. Finally, it
tightents up some missing synchronization and makes sure that we expose
only immutable objects to the stats subsystem.
2021-09-08 07:25:57 +01:00
Howard
4432b39112
[DOCS] Fix formatting for snapshot_meta thread pool (#76973) 2021-08-26 10:36:26 -04:00
Martijn van Groningen
8a1deff75a
Improve fault-detection.asciidoc (#76821)
Add section to fault-detection.asciidoc about nodes being removed from cluster
due to slow cluster state applying.
2021-08-23 14:31:06 +02:00
Tim Brooks
673e8e17f4
Enable LZ4 transport compression by default (#76326)
This commit enables LZ4 transport compression by default at the
indexing_data level.

Relates to #73497.
2021-08-17 12:19:42 -06:00
Tim Brooks
e6fd459a6e
Respond with same compression scheme received (#76372)
This is related to #73497. Currently, we only use the configured
transport.compression_scheme setting when compressing a request or a
response. Additionally, the cluster.remote.*.compression_scheme
setting is ignored. This commit fixes this behavior by respecting the
per-cluster setting. Additionally, it resolves confusion around inbound
and outbound connections by always responding with the same scheme that
was received. This allows remote connections to have different schemes
than local connections.
2021-08-13 13:29:22 -06:00
Francisco Fernández Castaño
2ebe5cd075
Add peer recoveries using snapshot files when possible (#76237)
This commit adds peer recoveries from snapshots. It allows establishing a replica by downloading file data from a snapshot rather than transferring the data from the primary. 

Enabling this feature is done on the repository definition. Repositories having the setting `use_for_peer_recovery=true` will be consulted to find a good snapshot when recovering a shard.

Relates #73496
2021-08-13 10:42:16 +02:00
Tim Brooks
425b7b280b
Add docs for production ready compression settings (#76441)
In 7.15, we intend for the indexing_data compression level and the
compression scheme lz4 to no longer be experimental. This commit
updates the documentation to reflect this. Additionally, it adds
missing docs for the cluster.remote.*.transport.compression_scheme
setting.

Relates to #73497.
2021-08-12 16:48:56 -06:00