Commit graph

948 commits

Author SHA1 Message Date
David Turner
822dc713d8
Add note on name resolution during startup (#95266)
Notes that the transport publish address is resolved once during
startup, plus advice to ensure that this name resolution doesn't vary by
location.
2023-04-17 14:42:15 +01:00
David Turner
f0989404ab
Bootstrapping docs clarifications (#94977)
Explains why you should remove `cluster.initial_master_nodes`, and
rewords some of the other sections a little for (subjectively) improved
readability.
2023-04-03 14:43:12 +01:00
David Kilfoyle
7cd484ac95
Revert "Cross-reference disclaimer" (#94829)
* Revert "Cross-reference disclaimer (#94801)"

This reverts commit 902649be31.

* Highlight sentences about removing `cluster.initial_master_nodes` setting
2023-03-28 11:17:53 -04:00
Stef Nestor
902649be31
Cross-reference disclaimer (#94801)
👋🏼 howdy, team! Can we cross pollinate the [important banner](https://www.elastic.co/guide/en/elasticsearch/reference/master/important-settings.html#initial_master_nodes) from the `cluster.initial_master_nodes` setting page to the related [bootstrap doc](https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-discovery-bootstrap-cluster.html#bootstrap-cluster-name) to avoid user's misunderstanding the latter's "This is only required the first time a cluster starts up" as saying they don't need to comment-out these settings?
2023-03-28 00:16:40 -04:00
David Turner
421c2d4731
Add request/response body logging to HTTP tracer (#93133)
Adds another logger, `org.elasticsearch.http.HttpBodyTracer`, which logs
the body of every HTTP request and response as well as the usual
summaries.
2023-03-15 11:13:36 -04:00
David Kilfoyle
32f7d046b7
Update http.max_content_length description (#94430) 2023-03-09 10:58:50 -05:00
Stef Nestor
a2837a2e3f
Confirm http.max_content_length for compressed (#94408)
👋 Per [StackOverflow](https://stackoverflow.com/questions/55724839/elasticsearch-http-max-content-length-when-compressed), can we append that `http.max_content_length` applies to the compressed HTTP size.
2023-03-09 09:21:36 -05:00
Abdon Pijpelink
2808512397
[DOCS] Improve watermark troubleshooting documentation (#94222) 2023-03-01 14:34:14 +01:00
Zacson
031854b6f4
[DOCS] Correct the calculation rules for limit the total number of cluster frozen shards (#93764) 2023-02-23 10:30:03 +01:00
Pooya Salehi
93a897c89d
Update snapshot threadpool size doc (#93655)
Co-authored-by: David Turner <david.turner@elastic.co>
2023-02-09 17:45:45 +01:00
Sylvain Wallez
484d3f4ada
Fixes CORS headers needed by Elastic clients (#85791)
* Fixes CORS headers needed by Elastic clients

Updates the default value for the `http.cors.allow-headers`
setting to include headers used by Elastic client libraries.

Also adds the `access-control-expose-headers` header to responses to
CORS requests so that clients can successfully perform their product
check.
2023-02-09 16:44:37 +01:00
Daniel Mitterdorfer
5ec28cc875
Document correct get thread pool size (#93541)
In #92309 we have aligned the size of the `search` and the `get` thread
pool but the docs still contain the prior `get` thread pool size. With
this commit we also align the docs.

Relates #92309
2023-02-08 07:19:55 +01:00
David Turner
4c68382065
Capture thread dump on ShardLockObtainFailedException (#93458)
We sometimes see a `ShardLockObtainFailedException` when a shard failed
to shut down as fast as we expected, often because a node left and
rejoined the cluster. Sometimes this is because it was held open by
ongoing scrolls or PITs, but other times it may be because the shutdown
process itself is too slow. With this commit we add the ability to
capture and log a thread dump at the time of the failure to give us more
information about where the shutdown process might be running slowly.

Relates #93226
2023-02-02 11:17:40 -05:00
Stef Nestor
eb1de9493e
[+DOC] node_concurrent_recoveries default (#90330)
Notes that `node_concurrent_recoveries` default is 2 (same as both sub-settings which already note that).
2023-01-18 13:53:48 +01:00
David Turner
dfab580976
Limit length of lag detector hot threads log lines (#92851)
If debug logging is enabled then the lag detector will capture and
report the hot threads of a lagging node. In some cases the resulting
log message can be very large, exceeding 10kiB, which means it is
truncated in most logging setups. The relevant thread(s) may be waiting
on I/O, which is not considered "hot" and therefore may not appear in
the first 10kiB.

This commit adjusts this logging mechanism to split the message into
chunks of size at most 2kiB (after compression and base64-encoding) to
ensure that the entire hot threads output can be faithfully
reconstructed from these logs.

Closes #88126
2023-01-13 13:11:26 +00:00
David Turner
6203560983
Fix docs for fault detection troubleshooting (#92749)
In #92742 we changed the logging around cluster membership changes but the docs
don't quite match the final version. This commit addresses that.
2023-01-09 10:17:06 +00:00
David Turner
5182748318
Improve node-{join,left} logging for troubleshooting (#92742)
Today to troubleshoot an unstable cluster we ask the users to parse the
rather complex `node-join` and `node-left` messages emitted by the
`MasterService`. These messages may refer to many nodes, may be
truncated, and are generally pretty hard to work with.

With this commit we start to emit a simplified log message about each
node added and removed. It also renames the respective executor classes:

- `JoinTaskExecutor` -> `NodeJoinExecutor`
- `NodeRemovalClusterStateTaskExecutor` -> `NodeLeftExecutor`

This brings their names in line with each other, and the messages that
they emit, whilst preserving the older `node-join` and `node-left`
terminology as reported by the `MasterService`.

Finally, it updates the troubleshooting logs to reflect these new and
simplified logs.

Relates #92741
2023-01-09 04:34:41 -05:00
Luiz Guilherme Pais dos Santos
9eec322424
Fix format for cluster.discovery_configuration_check.interval (#90452) 2022-12-22 16:11:33 +01:00
amyjtechwriter
e130617b1b
putting Miscellaneous cluster settings on it's own page (#92150) 2022-12-06 14:29:20 +00:00
David Turner
c9ae9123fe
Add docs for desired balance allocator (#92109)
These docs cover the new allocator and the settings controlling the
heuristics for combining disk usage and write load into the overall
weight.
2022-12-06 11:10:18 +00:00
Nick Canzoneri
2b268d359d
[docs] Update search-settings documentation to reflect the fact that the indices.query.bool.max_clause_count setting has been deprecated (#91811)
* Update search-settings documentation to reflect the fact that the indices.query.bool.max_clause_count setting has been deprecated

* Fix indentation

* Replace Elasticsearch with {es}

* Add deprecation entry to release notes

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2022-11-29 18:10:30 +01:00
Tim Brooks
c1b39322af
Update network threading documentation (#91027)
Currently the documentation on network threading suggests that we still
use a model where we have individual workers dedicated to server
sockets. That is no longer true and server sockets are assigned to
normal workers. This commit updates the documentation.
2022-11-08 09:19:39 -05:00
Frederic Dartayre
fe0036fdbf
Update threadpool.asciidoc (#90098)
* Update threadpool.asciidoc

Starting from 8.0 the value of the `node.processors` setting is  bounded by the number of available
processors https://github.com/elastic/elasticsearch/pull/44894

* Update docs/reference/modules/threadpool.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-10-26 14:04:39 -04:00
Iraklis Psaroudakis
0f4374f4fb
Explain disk headroom settings more in docs (#90763)
Relates to #81406
2022-10-20 18:45:23 +03:00
Paramdeep Singh
34ff7a9d98
Consolidated Circuit Breaker documentation to include EQL and ML infer (#90809)
Fixes #85851 

Co-authored-by: Iraklis Psaroudakis <kingherc@gmail.com>
2022-10-14 14:33:52 +03:00
Luca Cavanna
18942d5b11
Enhance nested depth tracking when parsing queries (#90425)
When parsing queries on the coordinating node, there is currently no way to share state between the different parsing methods (`fromXContent`). The only query that supports a parse context is bool query, which uses the context to track nested depth of queries, added with #66204. Such nested depth tracking mechanism is not 100% accurate as it tracks bool queries only, while there's many more query types that can hold other queries hence potentially cause stack overflow when deeply nested.

This change removes the parsing context that's specific to bool query, introduced with #66204, in favour of generalizing the nested depth tracking to all query types.

The generic tracking is introduced by wrapping the parser and overriding the method that parses named objects through the xcontent registry. Another way would have been to require a context argument when parsing queries, which would mean adding a context argument to all the QueryBuilder#fromXContent static methods. That would be a breaking change for plugins that provide custom queries, hence I went for trying out a different approach.

One aspect that this change requires and introduces is the distinction between parsing a top level query (which will wrap the parser, or it would create the context if we had one), as opposed to parsing an inner query, which goes ahead with the given parser and context. We already have this distinction as we have two different static methods in `AbstractQueryBuilder` but in practice only bool query makes the distinction being the only context-aware query.

In addition to generalizing tracking nested depth when parsing queries, we should be able to adopt this same strategy to track queries usage as part #90176 .

Given that the depth check is now more restrictive, as it counts all compound queries and not only bool, we have decided to raise the default limit to `30` to ensure that users are not going to hit the limit due to this change.
2022-10-12 15:15:06 +02:00
David Turner
c95fb2f3e8
More opinionated docs about http.max_content_length (#90500)
Adds to the docs a note that the `100mb` default for
`http.max_content_length` is the recommended maximum, along with
suggestions for what to do when hitting this limit.
2022-09-29 16:07:38 +01:00
Iraklis Psaroudakis
34471b1cd2
Introduce max headroom for disk watermark stages (#88639)
Introduce max headroom settings for the low, high, and flood disk watermark stages, similar to the existing max headroom setting for the flood stage of the frozen tier. Introduce new max headrooms in HealthMetadata and in ReactiveStorageDeciderService. Add multiple tests in DiskThresholdDeciderUnitTests, DiskThresholdDeciderTests and DiskThresholdMonitorTests. Moreover, addition & subtraction for ByteSizeValue, and min.
2022-09-19 14:59:18 +03:00
Leaf-Lin
65b05f858e
Add default value for destructive_requires_name (#85591)
As per https://github.com/elastic/elasticsearch/pull/66908, the setting now defaults to `True`, but it's not shown in the doc.  Can we please have the doc updated?
2022-08-25 08:44:53 -04:00
Francisco Fernández Castaño
837a8d7a6e
Add support for floating point node.processors setting (#89281)
This commit adds support for floating point node.processors setting.
This is useful when the nodes run in an environment where the CPU
time assigned to the ES node process is limited (i.e. using cgroups).
With this change, the system would be able to size the thread pools
accordingly, in this case it would round up the provided setting
to the closest integer.
2022-08-17 15:00:39 +02:00
David Turner
616fd07278
Drop transport client from ping_schedule docs (#89264)
The docs for `transport.ping_schedule` note that the transport client
defaults to a 5s ping schedule, but this is no longer relevant. This
commit drops this from the docs, and also moves the docs for this
setting further down the page to reflect its relative unimportance.
2022-08-11 09:25:14 +01:00
David Turner
c9d4892929
Weaken language about "low-latency" networks (#89198)
Today we say that voting-only nodes require a "low-latency" network.
This term has a specific meaning in some operating environments which is
different from our intended meaning. To avoid this confusion this commit
removes the absolute term "low-latency" in favour of describing the
requirements relative to the user's own performance goals.
2022-08-09 13:15:37 +01:00
David Turner
d5ea39b2e8
Clean up network setting docs (#88929)
Clean up network setting docs

- Add types for all params
- Remove mention of JDKs before 11
- Clarify some wording

Co-authored-by: Stef Nestor <steffanie.nestor@gmail.com>
2022-08-01 19:59:50 +01:00
David Turner
41a607af2e
Fix typo (missing word) (#88034) 2022-07-28 00:53:35 +09:30
Pooya Salehi
806d2976aa
Remove Blocks when disk threshold monitoring is disabled (#87841)
This change ensures that existing read_only_allow_delete blocks that
are placed on indices when the flood_stage watermark threshold is
exceeded, are removed when the disk threshold monitoring is disabled.

This is done by changing how InternalClusterInfoService behaves when
disabled. With this change, it will keep calling the registered
listeners periodically, but with an empty ClusterInfo.

Closes #86383
2022-07-26 14:26:43 +02:00
Nikolaj Volgushev
b04c0f3c3a
Increase http.max_header_size default to 16kb (#88725)
Our current default for the http.max_header_size setting is 8kb. This
is lower than the current default for Kibana (16kb in 8.x), and the ESS
proxy (1mb based on the Go http library default). To align with the
current convention of other Elastic components, this PR increases the
ES header size setting default to 16kb.

Closes #88501
2022-07-25 12:57:28 +02:00
Iraklis Psaroudakis
f284cc16f4
Convert disk watermarks to RelativeByteSizeValues (#88719)
* Convert disk watermarks to RelativeByteSizeValues

Similar to the existing watermark setting for the frozen tier.

Pre-requisite for PR 88639 that plans to introduce max headroom
settings for the disk watermarks, similar to the frozen tier max
headroom setting.

* Add changelog

* Revert 20gb to 20GB

* Make formatNoTrailingZerosPercent non static

* ByteSizeValue.MINUS_ONE

* Remove getMinimumTotalSizeForBelowWatermark

* Remove comment

* Fix minor stuff

* Make parsing of RelativeByteSizeValue faster

Mimicks older definitelyNotPercentage function

* Remove Locale from Strings.format

* More MINUS_ONE
2022-07-22 18:39:07 +03:00
Leaf-Lin
945cb27782
[DOCS] Adding discovery troubleshooting link in the master get help page (#87344)
* Adding discovery troubleshooting link

* Add tags to pull in discovery troubleshooting content

* Move discovery troubleshooting to separate page and add redirects

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-07-06 15:51:43 -04:00
Iraklis Psaroudakis
50d2cf31b8
Periodic warning for 1-node cluster w/ seed hosts (#88013)
For fully-formed single-node clusters, emit a periodic warning if seed_hosts has been set to a non-empty list.

Closes #85222
2022-06-30 16:35:15 +03:00
David Turner
80f7af58f8
More detail in discovery troubleshooting docs (#86930)
In #85074 we added docs on discovery troubleshooting that really only
talked about troubleshooting master elections. There's also the case
where the master is elected fine but some other node can't join it. This
commit adds troubleshooting docs about that too.

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-06-06 08:33:45 +01:00
Pooya Salehi
beadcaf631
Increase force_merge threadpool size (#87082)
Changes the default size used for the force_merge threadpool to 1/8 of
the allocated processors, with a minimum value of 1.

Closes #84943
2022-05-25 15:45:28 +02:00
Joe Gallo
79990fa49b
Remove "Push back excessive requests for stats (#83832)" (#87054) 2022-05-23 12:58:02 -04:00
David Turner
79f181d208
Reduce resource needs of join validation (#85380)
Fixes a few scalability issues around join validation:

- compresses the cluster state sent over the wire
- shares the serialized cluster state across multiple nodes
- forks the decompression/deserialization work off the transport thread

Relates #77466
Closes #83204
2022-04-26 12:15:54 +01:00
David Turner
33a553f61f Fix up whitespace error introduced in #85948 2022-04-19 07:58:10 +01:00
David Turner
ce004d49e7
More docs re. removing cluster.initial_master_nodes (#85948)
Ensures that on every page of the docs that mentions
`cluster.initial_master_nodes` also mentions that this setting must be
removed after bootstrapping completes.
2022-04-19 07:54:43 +01:00
David Turner
6a273886e9
Add technical docs on diagnosing instability etc (#85074)
Copies some internal troubleshooting docs to the reference manual for
wider use.

Co-authored-by: James Rodewig <james.rodewig@gmail.com>
2022-03-31 09:01:10 +01:00
James Rodewig
73e56e3cf8
[DOCS] Reuse data tier content in node role docs (#84346)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-03-28 14:32:01 -07:00
David Turner
fd76f9c5d1
Fix auto-bootstrap docs (#85215)
Today it's no longer true that by default nodes will auto-discover other
nodes on the same host and bootstrap them all into a cluster. This
commit fixes the docs on auto-bootstrapping to recognise this.
2022-03-22 16:35:48 +00:00
David Turner
ff742fcb27
More balanced docs about NFS etc (#85060)
Today we don't really say anything about the requirements for the data
path in terms of correctness, and we specifically say to avoid NFS for
performance reasons. This isn't wholly accurate: some NFS
implementations work just fine. This commit documents a more balanced
position on local vs remote storage.
2022-03-18 13:01:59 +00:00
Mary Gouseti
ed0bb2a8af
Push back excessive requests for stats (#83832)
Resolves #51992
2022-02-28 08:46:18 +01:00