Commit graph

299 commits

Author SHA1 Message Date
Ryan Ernst
22a52a9c64
Remove security manager policy files (#127727)
Now that security manager is gone, the policy files are no longer
needed. This commit removes the server, test and plugin specific policy
files
2025-05-06 19:37:46 +02:00
Ben Chaplin
053895854d
Always log data node failures (#127420)
Log search exceptions as they occur on the data node no matter the value 
of error_trace.
2025-04-29 09:40:31 -04:00
Luca Cavanna
df83e881f9
Cancel expired async search task when a remote returns its results (#126583)
A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.
2025-04-16 14:21:59 +02:00
Ben Chaplin
c11b8f130c
Remove unnecessary request from log tests (#126556) 2025-04-11 09:46:30 -04:00
Ryan Ernst
991e80d56e
Remove unnecessary generic params from action classes (#126364)
Transport actions have associated request and response classes. However,
the base type restrictions are not necessary to duplicate when creating
a map of transport actions. Relatedly, the ActionHandler class doesn't
actually need strongly typed action type and classes since they are lost
when shoved into the node client map. This commit removes these type
restrictions and generic parameters.
2025-04-07 16:22:56 -07:00
Ben Chaplin
9f6eb1d4e3
Log stack traces on data nodes before they are cleared for transport (#125732)
We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false.
2025-04-03 13:45:09 -04:00
Armin Braun
fd2cc97541
Introduce batched query execution and data-node side reduce (#121885)
This change moves the query phase a single roundtrip per node just like can_match or field_caps work already. 
A a result of executing multiple shard queries from a single request we can also partially reduce each node's query results on the data node side before responding to the coordinating node.

As a result this change significantly reduces the impact of network latencies on the end-to-end query performance, reduces the amount of work done (memory and cpu) on the coordinating node and the network traffic by factors of up to the number of shards per data node!

Benchmarking shows up to orders of magnitude improvements in heap and network traffic dimensions in querying across a larger number of shards.
2025-03-29 16:53:18 +01:00
Rene Groeschke
ae569def9c
[Build] Require reason for usesDefaultDistribution (#124707)
This makes using usesDefaultDistribution in our test setup for explicit by requiring a reason why it's needed.
This is helpful as part of revisiting the need for all those usages in our code base.
2025-03-17 08:25:39 +01:00
Armin Braun
425823cb5c
Remove some overhead from TransportService message handling (#124428)
Avoiding some indirection, volatile-reads and moving the listener
functionality that needlessly kept iterating an empty CoW list (creating
iterator instances, volatile reads, more code) in an effort to improve
the low IPC on transport threads.
2025-03-09 16:00:11 +01:00
Simon Cooper
274be7997a
Create a SearchResponseBuilder for creating SearchResponses in tests (#122196)
As well as simplifying test code, this also highlights which settings in the response are actually needed for individual tests
2025-02-21 12:00:17 +00:00
Armin Braun
d3abf9d5ba
Dry up search error trace ITs (#122138)
This logic will need a bit of adjustment for bulk query execution.
Lets dry it up before so we don't have to copy and paste the fix which
will be a couple lines.
2025-02-10 08:48:49 +01:00
Pawan Kartik
01edab58ff
Fix NPE caused by race condition in async search when minimise round trips is true (#117504)
* Fix NPE caused by race condition in async search when minimise round trips is true

Previously, the `notifyListShards()` initialised and updated the
required pre-requisites (`searchResponse` being amongst them) when a
search op began. This function takes in arguments that contain
shard-specific details amongst others. Because this information is not
immediately available when the search begins, it is not immediately
called. In some specific cases, there can be a race condition that can
cause the pre-requisities (such as `searchResponse`) to be accessed
before they're initialised, causing an NPE.

This fix addresses the race condition by splitting the initialisation
and subsequent updation amongst 2 different methods. This way, the
pre-requisities are always initialised and do not lead to an NPE.

* Try: call `notifyListShards()` after `notifySearchStart()` when minimize round trips is true

* Add removed code comment

* Pass `Clusters` to `SearchTask` rather than using progress listener to
signify search start.

To prevent polluting the progress listener with unnecessary search
specific details, we now pass the `Clusters` object to `SearchTask` when
a search op begins. This lets `AsyncSearchTask` access it and use it to
initialise `MutableSearchResponse` appropriately.

* Use appropriate `clusters` object rather than re-building it

* Do not double set `mutableSearchResponse`

* Move mutable entities such as shard counts out of `MutableSearchResponse`

* Address PR review: revert moving out mutable entities from
`MutableSearchResponse`

* Update docs/changelog/117504.yaml

* Get rid of `SetOnce` for `searchResponse`

* Drop redundant check around shards count

* Add a test that calls `onListShards()` at last and clarify `updateShardsAndClusters()`'s comment

* Fix test: ref count

* Address review comment: rewrite comment and test
2025-01-27 18:49:13 +00:00
Matteo Piergiovanni
7666ffb44b
unmute testRestartAfterCompletion (#120361) 2025-01-17 13:54:07 +01:00
Rene Groeschke
ba61f8c7f7
Update Gradle wrapper to 8.12 (#118683)
This updates the gradle wrapper to 8.12

We addressed deprecation warnings due to the update that includes:

- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
2024-12-30 15:34:24 +01:00
Matteo Piergiovanni
97bc2919ff
Prevent data nodes from sending stack traces to coordinator when error_trace=false (#118266)
* first iterations

* added tests

* Update docs/changelog/118266.yaml

* constant for error_trace and typos

* centralized putHeader

* moved threadContext to parent class

* uses NodeClient.threadpool

* updated async tests to retrieve final result

* moved test to avoid starting up a node

* added transport version to avoid sending useless bytes

* more async tests
2024-12-18 15:29:35 +01:00
Simon Cooper
09ce855d83
Remove some 7.7 and 7.8 transport version checks (#118563) 2024-12-16 09:08:06 +00:00
Armin Braun
eb0020f055
Introduce more parallelism into cross cluster test bootstrapping (#117820)
We can parallelize starting the clusters and a few other things
to effectively speed up these tests by 2x which comes out to about a minute
of execution time saved for all of those in :server:internalClusterTests
on my workstation.
2024-12-04 21:04:11 +01:00
Rene Groeschke
f6ac6e1c3b
[Build] Remove deprecated BuildParams (#116984) 2024-11-22 16:30:57 +01:00
Rene Groeschke
13c8aaeffa
[Gradle] Remove static use of BuildParams (#115122)
Static fields dont do well in Gradle with configuration cache enabled.

- Use buildParams extension in build scripts
- Keep BuildParams.ci for now for easy serverless migration
-  Tweak testing doc
2024-11-15 17:58:57 +01:00
Matteo Piergiovanni
db2eca345d
fixed testCCSClusterDetailsWhereAllShardsSkippedInCanMatch (#115774) 2024-10-28 16:49:44 +01:00
Matteo Piergiovanni
7f573c6c28
Only aggregations require at least one shard request (#115314)
* unskipping shards only when aggs

* Update docs/changelog/115314.yaml

* fixed more tests

* null check for searchRequest.source()
2024-10-25 08:50:05 +02:00
Luca Cavanna
8efd08b019
Upgrade to Lucene 10 (#114741)
The most relevant ES changes that upgrading to Lucene 10 requires are:

- use the appropriate IOContext
- Scorer / ScorerSupplier breaking changes
- Regex automaton are no longer determinized by default
- minimize moved to test classes
- introduce Elasticsearch900Codec
- adjust slicing code according to the added support for intra-segment concurrency
- disable intra-segment concurrency in tests
- adjust accessor methods for many Lucene classes that became a record
- adapt to breaking changes in the analysis area

Co-authored-by: Christoph Büscher <christophbuescher@posteo.de>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
2024-10-21 13:38:23 +02:00
Stanislav Malyshev
510a56bb96
Remove ccs_telemetry feature flag (#113825)
This removes `ccs_telemetry` feature flag, and instead introduces an
undocumented, true by default setting: - `search.ccs.collect_telemetry`
- enables CCS search telemetry collection and
`_cluster/stats?include_remote=true`. Can be disabled if this is causing
any problems.
2024-10-10 07:07:30 +11:00
David Turner
8f07d60c2c
Fix trappy timeouts in o.e.a.a.cluster.* (#112674)
Removes all usages of `TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT` in
cluster-related APIs in `:server`.

Relates #107984
2024-09-10 08:17:09 +01:00
Iván Cea Fontenla
d59df8af3e
Async search: Add ID and "is running" http headers (#112431)
Add the async execution ID and "is running" flag in the response as HTTP headers.
This allows users to know the request status without having to parse the response body.
It was also implemented in the `/_async_search/status/<id>` endpoint for consistency.

Continuation of https://github.com/elastic/elasticsearch/pull/111840, which implemented this same thing for ESQL.
Fixes https://github.com/elastic/elasticsearch/issues/109576
2024-09-05 08:14:35 +02:00
Stanislav Malyshev
cb4d7ff281
Skip CCS Usage telemetry ITs if feature flags is not enabled. (#112365) 2024-08-29 15:54:08 -06:00
Stanislav Malyshev
cf0e188728
Add isAsync() to SearcTask and eliminate code for async detection from TransportSearchAction (#112311) 2024-08-29 08:56:00 -06:00
Stanislav Malyshev
67d2380cbd
Collecting CCS usage telemetry stats (#111905)
* This creates the use CCSUsage and CCSUsageTelemetry classes and wires them up to the UsageService.

An initial set of telemetry metrics are now being gathered in TransportSearchAction.
Many more will be added later to meet all the requirements for the CCS Telemetry epic of work.

Co-authored-by: Michael Peterson <michael.peterson@elastic.co>
2024-08-28 13:36:16 -06:00
Patrick Doyle
35a375329a
Move Guice to org.elasticsearch.injection.guice (#111723)
* Move files and fix imports & module exports
* Other consequences of moving Guice
2024-08-12 10:47:46 -04:00
Armin Braun
daf30f96dc
Introduce and use a few more empty response type constants to o.e.c.lucene.Lucene (#109619)
Shortening a few more pieces of production code using constants,
potentially saving a little in code size and allocation in some cases.
2024-06-12 14:01:44 +02:00
Panagiotis Bailis
4a1d7426d7
Adding RankFeature implementation (#108538) 2024-06-06 11:20:53 +03:00
David Turner
314e24f1f4 AwaitsFix for #81941 2024-05-14 20:35:59 +01:00
Armin Braun
bca53baa14
Handle PIT Id as BytesReference instead of String (#107989)
Handling the PIT id as a `BytesReference` instead of as base64 encoded string
saves about a third of network traffic for these. We know that PIT ids can be a significant
source of traffic so the savings are well worth it.
Also, this saves cycles and memory on all nodes involved. A follow-up here would be exploring
to slice these IDs out of network buffer instead of copying them to reduce memory usage and large
allocations.
2024-04-29 13:42:57 +02:00
Slobodan Adamović
27c6fc4794
[Test] Fix AsyncSearchResponse resource leak in security tests (#107809)
Closes #107759
2024-04-24 10:47:06 +02:00
David Turner
d7733c06e0
Make randomTimeValue() return a TimeValue (#107689)
Today various test utility methods construct a random string that can be
parsed as a `TimeValue`, but in practice almost everywhere we just parse
it straight away. With this change we have the utility methods return a
`TimeValue` directly.
2024-04-23 07:21:53 +01:00
Michael Peterson
fde150011d
Users with monitor privileges can access async_search/status endpoint even when setting keep_alive (#107383)
Fixes a bug in the async-search status endpoint where a user with monitor privileges
is not able to access the status endpoint when setting keep_alive state of the async-search.
2024-04-18 08:51:01 -04:00
Armin Braun
3f20708965
Remove unused NamedWriteableRegistry from a search REST actions (#107126)
We don't need `NamedWriteableRegistry`to parse search requests any longer,
this was an unused parameter. Removing it from search request parsing allows
for removing it as a dependency from a number of places.
2024-04-05 10:51:53 +02:00
Albert Zaharovits
89cfb85c82
[Test] Fix AsyncSearchSecurityIT testStatusWithUsersWhileSearchIsRunning (#106912)
The error_query is only available in snapshot builds.
All test failures have the release-tests tag.

Closes #106871
2024-04-02 09:36:50 +03:00
Mark Vieira
7af3c8db01
AwaitsFix #106871 2024-03-30 11:13:05 -07:00
Tim Vernum
547e227ea2
Allow users to get status of own async search tasks (#106638)
This consists of 3 changes:

1. Refactoring the code so that all the security logic in the async search code is moved to AsyncSearchSecurity
2. Changing TransportGetAsyncStatusAction to check for ownership if the user does not have explicit access to the GetAsyncStatusAction (if they have such access it means that they can get the status of all async searches)
3. In RBACEngine, if a user does not have permission to GetAsyncStatusAction but does have permission to submit async searches, then let them run the action, relying on point 2 above.

Co-authored-by: Michael Peterson <michael.peterson@elastic.co>
2024-03-28 11:06:10 +11:00
Michael Peterson
01db0812a3
Improve error message for CrossClusterAsyncSearchIT test (#105988)
Will help debug issues like https://github.com/elastic/elasticsearch/issues/105865, which is a
non-reproducible occasional test error.
2024-03-07 08:59:58 -05:00
Michael Peterson
e0d2616c3b
CCS with minimize_roundtrips performs incremental merges of each SearchResponse (#105781)
This restores the functionality that was removed from 8.13
(waiting for a change on the Kibana side). The work for this feature
was added in #103134 but we had to remove the yaml changelog when
we turned off the functionality in #105455. So this PR restores the
changelog yaml as well.
2024-02-23 11:17:42 -05:00
Armin Braun
d693fc8b19
Fix search response leaks in async search tests (#105675)
Fixing all of these muted test classes, tried my best to keep indention changes to a minimum
but it wasn't possible to avoid them in all cases unfortunately.
2024-02-21 14:12:52 +01:00
Michael Peterson
84def3ad85
Async status response should set is_partial that same way that async response does (#104479) 2024-02-14 08:57:37 -05:00
Michael Peterson
8e53b0f0bf
Cross cluster search minimize roundtrips avoids incremental merges until Kibana polls via async-search-status endpoint (#105455)
This is a temporary change to avoid doing incremental merges in
cross-cluster async-search when minimize_roundtrips=true.

Currently, Kibana polls for status of the async-search via the
_async_search endpoint, which (without this change) will
do an incremental merge of all search results. Once Kibana
moves to polling status via _async_search/status, then
we will undo the change in this commit.
2024-02-13 14:21:24 -05:00
Jack Conradson
b5828fbb67
Add plumbing to check cluster features in SearchSourceBuilder (#105417)
This change adds additional plumbing to pipe through the available cluster features into 
SearchSourceBuilder. A number of different APIs use SearchSourceBuilder so they had to make this 
available through their parsers as well often through ParserContext. This change is largely mechanical 
passing a Predicate into existing REST actions to check for feature availability.

Note that this change was pulled mostly from this PR (#105040).
2024-02-13 08:30:04 -08:00
Armin Braun
5c8006499a
Move test-only search response x-content-parsing code to test codebase (#105308)
Loads of code here that is only used in tests and one duplicate unused
class that was only used as an indirection to parsing the
`AsyncSearchResponse`. Moved what I could easily move via automated
refactoring to `SearchResponseUtils` in tests and removed the duplicate
now unused class from the client codebase.
2024-02-09 11:56:39 +01:00
Ryan Ernst
7c039b1728
AwaitsFix more tests for #104838 2024-02-06 16:42:01 -08:00
Ryan Ernst
b67f5a6b57
Make cluster feature predicate available to plugins (#105022)
A predicate to check whether the cluster supports a feature is available
to rest handlers defined in server. This commit adds that predicate to
plugins defining rest handlers as well.
2024-02-01 09:11:18 -08:00
Michael Peterson
06a25b60c9
Add keep_alive param to the async-search status endpoint (#104629) 2024-01-31 17:25:37 -05:00