Commit graph

970 commits

Author SHA1 Message Date
David Turner
5dff56a00e
Mention network handler logging in docs (#100118)
Mentions the `InboundHandler` (and `OutboundHandler`) as potential
sources of useful log messages when tracking down a network threading
bug.
2023-10-02 08:52:16 +01:00
James Rodewig
4da2d31390
[main] [DOCS] Fix typo in query_cache.asciidoc (#99713) (#99810)
Co-authored-by: Joseph AFARI <71259267+joeafari@users.noreply.github.com>
2023-09-22 08:58:05 -04:00
James Rodewig
255c9a7f95
[DOCS] Move x-pack docs to docs/reference dir (#99209)
**Problem:**
For historical reasons, source files for the Elasticsearch Guide's security, watcher, and Logstash API docs are housed in the `x-pack/docs` directory. This can confuse new contributors who expect Elasticsearch Guide docs to be located in `docs/reference`. 

**Solution:**
- Move the security, watcher, and Logstash API doc source files to the `docs/reference` directory
- Update doc snippet tests to use security

Rel: https://github.com/elastic/platform-docs-team/issues/208
2023-09-12 14:53:41 -04:00
Abdon Pijpelink
54f6e4f51b
[DOCS] Remove 'coming in 8.10' from remote cluster API key auth docs (#99462) 2023-09-12 13:25:56 +02:00
Abdon Pijpelink
af76a3a436
[DOCS] Add 'Troubleshooting an unstable cluster' to nav (#99287)
* [DOCS] Add 'Troubleshooting an unstable cluster' to nav

* Adjust docs links in code

* Revert "Adjust docs links in code"

This reverts commit f3846b1d78.

---------

Co-authored-by: David Turner <david.turner@elastic.co>
2023-09-08 13:42:50 +02:00
Abdon Pijpelink
0421c4fe9b
[DOCS] Remote cluster troubleshooting guide (#99128)
* [DOCS] Remote cluster troubleshooting guide

* Fix test failures

* Apply suggestions from code review

Co-authored-by: Yang Wang <ywangd@gmail.com>

* Review feedback

* Group issues under 'common' and 'API key'

* Apply suggestions from code review

Co-authored-by: Yang Wang <ywangd@gmail.com>

---------

Co-authored-by: Yang Wang <ywangd@gmail.com>
2023-09-05 15:10:45 +02:00
Yang Wang
ebe4fe9f15
[Doc] Add links to the new API key based remote cluster page (#99115)
This PR adds links to the new API key based remote cluster page in
multiple places.

Relates: #98330
2023-09-01 06:08:49 -04:00
Abdon Pijpelink
4f1bf97776
[DOCS] Expand the step that enables the remote cluster server (#99084)
* [DOCS] Expand the step that enables the remote cluster server

* Update docs/reference/modules/cluster/remote-clusters-api-key.asciidoc

* Reword

* Reword
2023-09-01 10:35:46 +02:00
Abdon Pijpelink
792f9c1647
[DOCS] Remote cluster migration guide (#98999)
* [DOCS] Remote cluster migration guide

* Review feedback

* Clarify that any extra local privileges will be suppressed by the cross-cluster API key’s privileges
2023-08-31 10:24:20 +02:00
Stef Nestor
de380ea2af
[DOC+] Write threadpool also covers ingest pipelines (#99010)
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-08-29 13:51:18 -04:00
Abdon Pijpelink
1955bd8ad4
[DOCS] New docs for remote clusters using API key authentication (#98330)
* New docs structure for remote clusters

* Fix broken cross-book link errors

* More broken cross-book link errors

* Remove redirects for new pages

* Link to generic remote cluster docs instead

* Drop 'API' from the abbreviated title

* Add 'Establish trust with a remote cluster' section

* Restructure 'Establish trust' section into Prprequisite/local/remote instructions

* Add 'Configure roles and users' section

* Add 'Connect to a remote cluster' section

* Move version compatibility to prerequisites

* Fix test errors

* Incorporate review feedback

* Mention version 8.10 or later in the intro for API keys

* Add license prerequisite
2023-08-24 12:30:03 +02:00
Roberto Seldner
79d2879564
Add deprecated note for balanced allocator (#98610)
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-08-17 12:50:52 -04:00
Yang Wang
b337f9b6f3
[Docs] Misc doc update for RCS 2.0 (#98472)
This PR adds docs for the following items: * Remote indices privileges *
Remote cluster network settings * Remote cluster security settings * New
privileges * New response field for RemoteInfo API

List of preview pages: * [Remote indices in defining
roles](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/defining-roles.html#roles-remote-indices-priv)
* [Remote indices in PutRole
API](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-api-put-role.html#security-api-put-role-request-body)
* [Remote cluster server SSL
settings](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-settings.html#_remote_cluster_server_api_key_based_model_tlsssl_settings)
* [Remote cluster client SSL
settings](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-settings.html#_remote_cluster_client_api_key_based_model_tlsssl_settings)
* [Remote cluster network
settings](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/modules-network.html#remote-cluster-network-settings)
and
[here](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/modules-network.html#common-network-settings)
* [Remote cluster credentials
setting](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/remote-clusters-settings.html)
* [New
privileges](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-privileges.html)
* [New response field for RemoteInfo
API](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/cluster-remote-info.html#cluster-remote-info-api-response-body)
2023-08-15 20:11:21 -04:00
Luca Cavanna
4023454483
Introduce executor for concurrent search (#98204)
This commit enables concurrent search execution in the DFS phase, which is going to improve resource usage as well as performance of knn queries which benefit from both concurrent rewrite and collection.

We will enable concurrent execution for the query phase in a subsequent commit. While this commit does not introduce parallelism for the query phase, it introduces offloading sequential computation to the newly introduced executor. This is true both for situations where a single slice needs to be searched, as well as scenarios where a specific request does not support concurrency (currently only DFS phase does regardless of the request). Sequential collection is not offloaded only if the request includes aggregations that don't support offloading: composite, nested and cardinality as their post collection method must be executed in the same thread as the collection or we'll trip a lucene assertion that verifies that doc_values are pulled and consumed from the same thread.

## Technical details

This commit introduces a secondary executor, used exclusively to execute the concurrent bits of search. The search threads are still the ones that coordinate the search (where the caller search will originate from), but the actual work will be offloaded to the newly introduced executor.

We are offloading not only parallel execution but also sequential execution, to make the workload more predictable, as it would be surprising to have bits of search executed in either of the two thread pools. Also, that would introduce the possibility to suddenly run a higher amount of heavy operations overall (some in the caller thread and some in the separate threads), which could overload the system as well as make sizing of thread pools more difficult.

Note that fetch, together with other actions,  is still executed in the search thread pool. This commit does not make the search thread pool merely a coordinating only thread pool, It does so only for what concerns the IndexSearcher#search operation itself, which is though a big portion of the different phases of search API execution.

Given that the searcher blocks waiting for all tasks to be completed, we take a simple approach of introducing a thread pool executor that has the same size as the existing search thread pool but relies on an unbounded queue. This simplifies handling of thread pool queue and rejections. In fact, we'd like to guarantee that the secondary thread pool won't reject, and delegate queuing entirely to the search thread pool which is the entry point for every search operation anyway. The principle behind this is that if you got a slot in the search thread pool, you should be able to complete your search, and rather quickly.

As part of this commit we are also introducing the ability to cancel tasks that have not started yet, so that if any task throws an exception, other tasks are prevented from starting needless computation.

Relates to #80693
Relates to #90700
2023-08-10 12:40:36 +02:00
David Turner
0f6a217ed8
Fix admonition about initial_master_nodes (#98242)
Admonition paragraphs cannot be combined with a `+` continuation mark.
This commit fixes the formatting by using an admonition block instead.
2023-08-08 11:50:36 +01:00
David Turner
847ec45baa
Remove bound on SEARCH_COORDINATION default size (#98264)
Today by default the `SEARCH_COORDINATION` pool is sized at half the
allocated processors, or five if there are more than ten CPUs. Yet, if
we scale up a node to have more than ten CPUs, we probably want to scale
up the number of search coordination threads to match. This commit
removes the limit of five threads.
2023-08-08 07:09:25 +01:00
Pooya Salehi
966eb022d9
[DOCS] Mention mmap and FD limits when increasing default max shard per node (#97975) 2023-07-26 16:45:27 +02:00
David Turner
09e53f9ad9
Enhance docs around network troubleshooting (#97305)
Discovery, like cluster membership, can also be affected by network-like
issues (e.g. GC/VM pauses, dropped packets and blocked threads) so this
commit duplicates the troubleshooting info across both places.
2023-07-10 10:57:44 +01:00
James Rodewig
ff84ad1469
[DOCS] Note license requirements for CCS (#97252)
Notes that CCS requires both clusters to use the same license level for full capabilities.
2023-06-29 16:55:10 -04:00
David Turner
2a49ad929c
Slightly better hot threads for transport workers (#96315)
A completely idle `transport_worker` thread is reported as `0.0%` idle,
which is confusing. Moreover the docs on the network threading model do
not reflect the changes made in #90482. This commit fixes both of those
things.
2023-05-25 12:08:08 +01:00
debadair
777598d602
[DOCS] Remove redirect pages (#88738)
* [DOCS] Remove manual redirects

* [DOCS] Removed refs to modules-discovery-hosts-providers

* [DOCS] Fixed broken internal refs

* Fixing bad cross links in ES book, and adding redirects.asciidoc[] back into docs/reference/index.asciidoc.

* Update docs/reference/search/point-in-time-api.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/setup/restart-cluster.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/sql/endpoints/translate.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/snapshot-restore/restore-snapshot.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update repository-azure.asciidoc

* Update node-tool.asciidoc

* Update repository-azure.asciidoc

---------

Co-authored-by: amyjtechwriter <61687663+amyjtechwriter@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-05-24 12:32:46 +01:00
David Turner
7a517cb4a0
Add note on jstack frequency for troubleshooting (#95764)
Suggest calling `jstack` every 15s to ensure that at least one capture
shows a stuck thread. Also adds a link to this guide to the list on the
troubleshooting overview page.
2023-05-03 10:04:13 +01:00
David Turner
822dc713d8
Add note on name resolution during startup (#95266)
Notes that the transport publish address is resolved once during
startup, plus advice to ensure that this name resolution doesn't vary by
location.
2023-04-17 14:42:15 +01:00
David Turner
f0989404ab
Bootstrapping docs clarifications (#94977)
Explains why you should remove `cluster.initial_master_nodes`, and
rewords some of the other sections a little for (subjectively) improved
readability.
2023-04-03 14:43:12 +01:00
David Kilfoyle
7cd484ac95
Revert "Cross-reference disclaimer" (#94829)
* Revert "Cross-reference disclaimer (#94801)"

This reverts commit 902649be31.

* Highlight sentences about removing `cluster.initial_master_nodes` setting
2023-03-28 11:17:53 -04:00
Stef Nestor
902649be31
Cross-reference disclaimer (#94801)
👋🏼 howdy, team! Can we cross pollinate the [important banner](https://www.elastic.co/guide/en/elasticsearch/reference/master/important-settings.html#initial_master_nodes) from the `cluster.initial_master_nodes` setting page to the related [bootstrap doc](https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-discovery-bootstrap-cluster.html#bootstrap-cluster-name) to avoid user's misunderstanding the latter's "This is only required the first time a cluster starts up" as saying they don't need to comment-out these settings?
2023-03-28 00:16:40 -04:00
David Turner
421c2d4731
Add request/response body logging to HTTP tracer (#93133)
Adds another logger, `org.elasticsearch.http.HttpBodyTracer`, which logs
the body of every HTTP request and response as well as the usual
summaries.
2023-03-15 11:13:36 -04:00
David Kilfoyle
32f7d046b7
Update http.max_content_length description (#94430) 2023-03-09 10:58:50 -05:00
Stef Nestor
a2837a2e3f
Confirm http.max_content_length for compressed (#94408)
👋 Per [StackOverflow](https://stackoverflow.com/questions/55724839/elasticsearch-http-max-content-length-when-compressed), can we append that `http.max_content_length` applies to the compressed HTTP size.
2023-03-09 09:21:36 -05:00
Abdon Pijpelink
2808512397
[DOCS] Improve watermark troubleshooting documentation (#94222) 2023-03-01 14:34:14 +01:00
Zacson
031854b6f4
[DOCS] Correct the calculation rules for limit the total number of cluster frozen shards (#93764) 2023-02-23 10:30:03 +01:00
Pooya Salehi
93a897c89d
Update snapshot threadpool size doc (#93655)
Co-authored-by: David Turner <david.turner@elastic.co>
2023-02-09 17:45:45 +01:00
Sylvain Wallez
484d3f4ada
Fixes CORS headers needed by Elastic clients (#85791)
* Fixes CORS headers needed by Elastic clients

Updates the default value for the `http.cors.allow-headers`
setting to include headers used by Elastic client libraries.

Also adds the `access-control-expose-headers` header to responses to
CORS requests so that clients can successfully perform their product
check.
2023-02-09 16:44:37 +01:00
Daniel Mitterdorfer
5ec28cc875
Document correct get thread pool size (#93541)
In #92309 we have aligned the size of the `search` and the `get` thread
pool but the docs still contain the prior `get` thread pool size. With
this commit we also align the docs.

Relates #92309
2023-02-08 07:19:55 +01:00
David Turner
4c68382065
Capture thread dump on ShardLockObtainFailedException (#93458)
We sometimes see a `ShardLockObtainFailedException` when a shard failed
to shut down as fast as we expected, often because a node left and
rejoined the cluster. Sometimes this is because it was held open by
ongoing scrolls or PITs, but other times it may be because the shutdown
process itself is too slow. With this commit we add the ability to
capture and log a thread dump at the time of the failure to give us more
information about where the shutdown process might be running slowly.

Relates #93226
2023-02-02 11:17:40 -05:00
Stef Nestor
eb1de9493e
[+DOC] node_concurrent_recoveries default (#90330)
Notes that `node_concurrent_recoveries` default is 2 (same as both sub-settings which already note that).
2023-01-18 13:53:48 +01:00
David Turner
dfab580976
Limit length of lag detector hot threads log lines (#92851)
If debug logging is enabled then the lag detector will capture and
report the hot threads of a lagging node. In some cases the resulting
log message can be very large, exceeding 10kiB, which means it is
truncated in most logging setups. The relevant thread(s) may be waiting
on I/O, which is not considered "hot" and therefore may not appear in
the first 10kiB.

This commit adjusts this logging mechanism to split the message into
chunks of size at most 2kiB (after compression and base64-encoding) to
ensure that the entire hot threads output can be faithfully
reconstructed from these logs.

Closes #88126
2023-01-13 13:11:26 +00:00
David Turner
6203560983
Fix docs for fault detection troubleshooting (#92749)
In #92742 we changed the logging around cluster membership changes but the docs
don't quite match the final version. This commit addresses that.
2023-01-09 10:17:06 +00:00
David Turner
5182748318
Improve node-{join,left} logging for troubleshooting (#92742)
Today to troubleshoot an unstable cluster we ask the users to parse the
rather complex `node-join` and `node-left` messages emitted by the
`MasterService`. These messages may refer to many nodes, may be
truncated, and are generally pretty hard to work with.

With this commit we start to emit a simplified log message about each
node added and removed. It also renames the respective executor classes:

- `JoinTaskExecutor` -> `NodeJoinExecutor`
- `NodeRemovalClusterStateTaskExecutor` -> `NodeLeftExecutor`

This brings their names in line with each other, and the messages that
they emit, whilst preserving the older `node-join` and `node-left`
terminology as reported by the `MasterService`.

Finally, it updates the troubleshooting logs to reflect these new and
simplified logs.

Relates #92741
2023-01-09 04:34:41 -05:00
Luiz Guilherme Pais dos Santos
9eec322424
Fix format for cluster.discovery_configuration_check.interval (#90452) 2022-12-22 16:11:33 +01:00
amyjtechwriter
e130617b1b
putting Miscellaneous cluster settings on it's own page (#92150) 2022-12-06 14:29:20 +00:00
David Turner
c9ae9123fe
Add docs for desired balance allocator (#92109)
These docs cover the new allocator and the settings controlling the
heuristics for combining disk usage and write load into the overall
weight.
2022-12-06 11:10:18 +00:00
Nick Canzoneri
2b268d359d
[docs] Update search-settings documentation to reflect the fact that the indices.query.bool.max_clause_count setting has been deprecated (#91811)
* Update search-settings documentation to reflect the fact that the indices.query.bool.max_clause_count setting has been deprecated

* Fix indentation

* Replace Elasticsearch with {es}

* Add deprecation entry to release notes

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2022-11-29 18:10:30 +01:00
Tim Brooks
c1b39322af
Update network threading documentation (#91027)
Currently the documentation on network threading suggests that we still
use a model where we have individual workers dedicated to server
sockets. That is no longer true and server sockets are assigned to
normal workers. This commit updates the documentation.
2022-11-08 09:19:39 -05:00
Frederic Dartayre
fe0036fdbf
Update threadpool.asciidoc (#90098)
* Update threadpool.asciidoc

Starting from 8.0 the value of the `node.processors` setting is  bounded by the number of available
processors https://github.com/elastic/elasticsearch/pull/44894

* Update docs/reference/modules/threadpool.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-10-26 14:04:39 -04:00
Iraklis Psaroudakis
0f4374f4fb
Explain disk headroom settings more in docs (#90763)
Relates to #81406
2022-10-20 18:45:23 +03:00
Paramdeep Singh
34ff7a9d98
Consolidated Circuit Breaker documentation to include EQL and ML infer (#90809)
Fixes #85851 

Co-authored-by: Iraklis Psaroudakis <kingherc@gmail.com>
2022-10-14 14:33:52 +03:00
Luca Cavanna
18942d5b11
Enhance nested depth tracking when parsing queries (#90425)
When parsing queries on the coordinating node, there is currently no way to share state between the different parsing methods (`fromXContent`). The only query that supports a parse context is bool query, which uses the context to track nested depth of queries, added with #66204. Such nested depth tracking mechanism is not 100% accurate as it tracks bool queries only, while there's many more query types that can hold other queries hence potentially cause stack overflow when deeply nested.

This change removes the parsing context that's specific to bool query, introduced with #66204, in favour of generalizing the nested depth tracking to all query types.

The generic tracking is introduced by wrapping the parser and overriding the method that parses named objects through the xcontent registry. Another way would have been to require a context argument when parsing queries, which would mean adding a context argument to all the QueryBuilder#fromXContent static methods. That would be a breaking change for plugins that provide custom queries, hence I went for trying out a different approach.

One aspect that this change requires and introduces is the distinction between parsing a top level query (which will wrap the parser, or it would create the context if we had one), as opposed to parsing an inner query, which goes ahead with the given parser and context. We already have this distinction as we have two different static methods in `AbstractQueryBuilder` but in practice only bool query makes the distinction being the only context-aware query.

In addition to generalizing tracking nested depth when parsing queries, we should be able to adopt this same strategy to track queries usage as part #90176 .

Given that the depth check is now more restrictive, as it counts all compound queries and not only bool, we have decided to raise the default limit to `30` to ensure that users are not going to hit the limit due to this change.
2022-10-12 15:15:06 +02:00
David Turner
c95fb2f3e8
More opinionated docs about http.max_content_length (#90500)
Adds to the docs a note that the `100mb` default for
`http.max_content_length` is the recommended maximum, along with
suggestions for what to do when hitting this limit.
2022-09-29 16:07:38 +01:00
Iraklis Psaroudakis
34471b1cd2
Introduce max headroom for disk watermark stages (#88639)
Introduce max headroom settings for the low, high, and flood disk watermark stages, similar to the existing max headroom setting for the flood stage of the frozen tier. Introduce new max headrooms in HealthMetadata and in ReactiveStorageDeciderService. Add multiple tests in DiskThresholdDeciderUnitTests, DiskThresholdDeciderTests and DiskThresholdMonitorTests. Moreover, addition & subtraction for ByteSizeValue, and min.
2022-09-19 14:59:18 +03:00