elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-27 17:10:22 -04:00

Author	SHA1	Message	Date
Ben Chaplin	053895854d	Always log data node failures (#127420 ) Log search exceptions as they occur on the data node no matter the value of error_trace.	2025-04-29 09:40:31 -04:00
Dianna Hohensee	0700b24dd0	Create some general test utilities (#127407 ) Moving around and adding some test utilities.	2025-04-28 14:10:28 -04:00
Niels Bauman	c72d00fd39	Don't start a new node in `InternalTestCluster#getClient` (#127318 ) This method would default to starting a new node when the cluster was empty. This is pretty trappy as `getClient()` (or things like `getMaster()` that depend on `getClient()`) don't look at all like something that would start a new node. In any case, the intention of tests is much clearer when they explicitly define a cluster configuration.	2025-04-25 10:07:52 +02:00
David Turner	1461820dac	Fix race condition in `RestCancellableNodeClient` (#126686 ) Today we rely on registering the channel after registering the task to be cancelled to ensure that the task is cancelled even if the channel is closed concurrently. However the client may already have processed a cancellable request on the channel and therefore this mechanism doesn't work. With this change we make sure not to register another task after draining the registrations in order to cancel them. Closes #88201	2025-04-12 00:59:46 +10:00
Ben Chaplin	9f6eb1d4e3	Log stack traces on data nodes before they are cleared for transport (#125732 ) We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false.	2025-04-03 13:45:09 -04:00
Niels Bauman	483f97915c	Run `TransportGetIndexAction` on local node (#125652 ) This action solely needs the cluster state, it can run on any node. Since this is the last class/action that extends the `ClusterInfo` abstract classes, we remove those classes too as they're not required anymore. Relates #101805	2025-04-02 18:41:35 +01:00
Niels Bauman	eb4d64f94a	Run `TransportGetSettingsAction` on local node (#126051 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. Relates #101805	2025-04-02 15:05:31 +01:00
Armin Braun	fd2cc97541	Introduce batched query execution and data-node side reduce (#121885 ) This change moves the query phase a single roundtrip per node just like can_match or field_caps work already. A a result of executing multiple shard queries from a single request we can also partially reduce each node's query results on the data node side before responding to the coordinating node. As a result this change significantly reduces the impact of network latencies on the end-to-end query performance, reduces the amount of work done (memory and cpu) on the coordinating node and the network traffic by factors of up to the number of shards per data node! Benchmarking shows up to orders of magnitude improvements in heap and network traffic dimensions in querying across a larger number of shards.	2025-03-29 16:53:18 +01:00
Mark Vieira	0388a5980c	Migrate legacy QA projects to new test clusters framework (#125545 )	2025-03-26 10:05:56 -07:00
Niels Bauman	481d91c428	Run `TransportGetMappingsAction` on local node (#122921 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. Relates #101805	2025-03-15 07:59:28 +00:00
Armin Braun	425823cb5c	Remove some overhead from TransportService message handling (#124428 ) Avoiding some indirection, volatile-reads and moving the listener functionality that needlessly kept iterating an empty CoW list (creating iterator instances, volatile reads, more code) in an effort to improve the low IPC on transport threads.	2025-03-09 16:00:11 +01:00
Armin Braun	d3abf9d5ba	Dry up search error trace ITs (#122138 ) This logic will need a bit of adjustment for bulk query execution. Lets dry it up before so we don't have to copy and paste the fix which will be a couple lines.	2025-02-10 08:48:49 +01:00
Artem Prigoda	62f0fe869a	Remove the `failures` field from snapshot responses (#114496 ) Failure handling for snapshots was made stricter in #107191 (8.15), so this field is always empty since then. Clients don't need to check it anymore for failure handling, we can remove it from API responses in 9.0	2025-02-05 15:35:38 +01:00
Niels Bauman	5efe216958	Run `GetPipelineTransportAction` on local node (#120445 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. Relates #101805	2025-01-22 08:16:31 +10:00
Niels Bauman	4ccd377d27	Run `TransportClusterGetSettingsAction` on local node (#119831 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. The `?local` parameter becomes a no-op and is marked as deprecated.	2025-01-14 03:45:58 +00:00
Niels Bauman	27a9c4d911	Run template simulation actions on local node (#120038 ) The actions `TransportSimulateTemplateAction` and `TransportSimulateIndexTemplateAction` solely need the cluster state, they can run on any node. Additionally, they need to be cancellable to avoid doing unnecessary work after a client failure or timeout. As a drive-by, this removes more usages of the trappy default master node timeout.	2025-01-14 12:41:05 +10:00
Niels Bauman	80e8017bb6	Run `TransportGetIndexTemplatesAction` on local node (#119837 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. As a drive-by, this removes another usage of the trappy default master node timeout.	2025-01-10 00:20:16 +00:00
Niels Bauman	65e4ec129c	Run `TransportGetComposableIndexTemplate` on local node (#119830 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. As a drive-by, this removes another usage of the trappy default master node timeout.	2025-01-10 09:00:31 +10:00
Niels Bauman	9641c7623f	Run TransportGetComponentTemplateAction on local node (#116868 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. The `?local` parameter becomes a no-op and is marked as deprecated. Relates #101805 Relates #107984	2024-12-23 20:01:21 +00:00
Matteo Piergiovanni	97bc2919ff	Prevent data nodes from sending stack traces to coordinator when `error_trace=false` (#118266 ) * first iterations * added tests * Update docs/changelog/118266.yaml * constant for error_trace and typos * centralized putHeader * moved threadContext to parent class * uses NodeClient.threadpool * updated async tests to retrieve final result * moved test to avoid starting up a node * added transport version to avoid sending useless bytes * more async tests	2024-12-18 15:29:35 +01:00
Henrique Paes	4740b02a9b	Wrap jackson exception on malformed json string (#114445 ) This commit hides the underlying Jackson parse exception when encountered while parsing string tokens.	2024-12-05 09:22:48 -08:00
Simon Cooper	04e04ceaf1	Remove Version from system index descriptors (#115793 ) Now it just uses mapping versions	2024-10-31 11:12:15 +00:00
Tim Brooks	e144184896	Standardize error code when bulk body is invalid (#114869 ) Currently the incremental and non-incremental bulk variations will return different error codes when the json body provided is invalid. This commit ensures both version return status code 400. Additionally, this renames the incremental rest tests to bulk tests and ensures that all tests work with both bulk api versions. We set these tests to randomize which version of the api we test each run.	2024-10-16 12:18:35 -06:00
David Turner	cd427198dc	More verbose logging in `IndicesSegmentsRestCancellationIT` (#113844 ) Relates #88201	2024-10-03 19:53:47 +10:00
Tim Brooks	6759ae2e89	Introduce watermarks for indexing pressure backoff (#113912 ) Currently we have a relatively basic decider about when to throttling indexing. This commit adds two levels of watermarks with configurable bulk size deciders. Additionally, adds additional settings to control primary, coordinating, and replica rejection limits.	2024-10-02 10:06:33 -06:00
David Turner	e9d0dd9e28	Fix `testClusterHealthRestCancellation` (#113680 ) This test was failing due to a race between an early cancellation check and the cancel operation. With this commit we wait until the action is definitely blocked before cancelling the task. Closes #100062	2024-09-30 07:47:09 +01:00
Tim Brooks	d146b27a26	Default incremental bulk functionality to false (#113416 ) This commit flips the incremental bulk setting to false. Additionally, it removes some test code which intermittently causes issues with security test cases.	2024-09-24 06:26:48 +10:00
Tim Brooks	c5caf84e2d	Move raw path into HttpPreRequest (#113231 ) Currently, the raw path is only available from the RestRequest. This makes the logic to determine if a handler supports streaming more challenging to evaluate. This commit moves the raw path into pre request to allow easier streaming support logic.	2024-09-21 05:32:45 +10:00
David Turner	6ff138f558	Drop useless `AckedRequest` interface (#113255 ) Almost every implementation of `AckedRequest` is an `AcknowledgedRequest` too, and the distinction is rather confusing. Moreover the other implementations of `AckedRequest` are a potential source of `null` timeouts that we'd like to get rid of. This commit simplifies the situation by dropping the unnecessary `AckedRequest` interface entirely.	2024-09-20 12:33:07 +01:00
Tim Brooks	92daeeba11	Properly handle empty incremental bulk requests (#112974 ) This commit ensures we properly throw exceptions when an empty bulk request is received with the incremental handling enabled.	2024-09-18 13:52:10 -06:00
Mikhail Berezovskiy	dce8a0bfd3	merge main	2024-09-18 13:52:10 -06:00
Tim Brooks	95b42a7129	Ensure incremental bulk setting is set atomically (#112479 ) Currently the rest.incremental_bulk is read in two different places. This means that it will be employed in two steps introducing unpredictable behavior. This commit ensures that it is only read in a single place.	2024-09-18 13:40:39 -06:00
Tim Brooks	a03fb12b09	Incremental bulk integration with rest layer (#112154 ) Integrate the incremental bulks into RestBulkAction	2024-09-18 13:40:39 -06:00
Mark Vieira	a59c182f9f	Add AGPLv3 as a supported license	2024-09-13 15:29:46 -07:00
Mikhail Berezovskiy	c1c5fe64b3	Use opaque id in task cancellation assertion (#110680 ) Add use of Opaque ID HTTP header in task cancellation assertion. In some tests, like this #88201 `testCatSegmentsRestCancellation`, we assert that all tasks related to specific HTTP request are cancelled. But we do blanket approach in assertion block catching all tasks by action name. I think narrowing down assertion to specific http request in this case would be more accurate. It is still not clear why test mentioned above failing, but after hours of investigation and injecting random delays, I'm inclining more to @DaveCTurner's comment about interference from other tests or cluster activity. I added additional log that will report when we spot task with different opaque id.	2024-07-12 14:35:57 +10:00
David Turner	5662f988b2	Remove trappy timeouts in snapshot APIs (#109828 ) Wholesale fix of every `TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT` in `o.e.snapshots` and `o.e.repositories`, just pulling them up to the REST layer (where they become API params), the test suite (where they become `TEST_REQUEST_TIMEOUT`), or some other place where an explicit value is available. Relates #107984	2024-06-21 07:11:12 +10:00
Patrick Doyle	43b2e877e0	Revert "Move PluginsService to its own internal package (#109872 )" (#109946 ) This reverts commit `b9e7965184`.	2024-06-19 18:10:50 -04:00
Patrick Doyle	b9e7965184	Move PluginsService to its own internal package (#109872 ) * Mechanical package change in IntelliJ * A couple of manual fixups * Export plugins.loading to deprecation * Put plugin-cli in a module so can export PluginsUtils to it.	2024-06-19 15:23:47 -04:00
Ievgen Degtiarenko	d3a285e1c7	Fix testDanglingIndicesCanBeListed (#108599 ) The test started failing because of the recent changes to allow closing (and deleting shards) asynchronously. As a result dandling index API now is seeing a directory in partially deleted state, fails to interpret partial data and fails as a result. The fix retries the failure on the client.	2024-05-14 11:40:27 +02:00
David Turner	30d31bffb2	Introduce `RestUtils#getMasterNodeTimeout` (#107986 ) Many APIs accept a `?master_timeout` parameter, but reading this parameter requires a little unnecessary boilerplate to specify the literal parameter name and default value. Moreover, today's convention is to construct a `MasterNodeRequest` and then read the default master timeout from the freshly-created request. In practice this results in a default of 30s, but we specify in the docs that this default is _always_ 30s, and in principle one could create a transport request with a different initial value which would deviate from the documented behaviour. This commit introduces a utility method for reading this parameter in a fashion which is completely consistent with the documented behaviour. Relates #107984	2024-04-29 08:03:32 +01:00
Armin Braun	05a2ff0375	Remove some more ActionType implementations (#107664 ) Cleaning up a couple more of these.	2024-04-20 20:01:04 +02:00
Jonathan Buttner	d8348560a9	muting (#107496 ) Muting https://github.com/elastic/elasticsearch/issues/100062	2024-04-15 17:17:34 -04:00
Ievgen Degtiarenko	32bcb13ac4	Introduce an easy way to get node id by its name (#107392 ) Our test utility returns the node name when starting a new node. A lot of APIs (such as routing table or node shutdown) require a node id. This change introduces a simple way to retrieve the node id based on its name.	2024-04-12 10:50:11 +02:00
David Turner	9a907704b7	Move `XContent` -> `SnapshotInfo` parsing out of prod (#106669 ) The code to parse a `SnapshotInfo` object out of an `XContent` response body is only used in tests, so this commit moves it out of the production codebase and into the test framework.	2024-03-22 09:46:46 -04:00
David Turner	12e567d29e	Consolidate get-snapshots `?after` logic (#106038 ) Today the handling of the `?after` param is kinda spread out over `TransportGetSnapshotsAction` and `GetSnapshotsRequest` making it hard to follow and adding unnecessary complexity to these two classes. This commit moves it into `SnapshotSortKey` which is a better fit since the behaviour varies so much for different sort keys.	2024-03-12 05:16:46 -04:00
David Turner	1fae3e7501	Extract `SnapshotSortKey` (#106015 ) The behaviour of the get-snapshots API varies quite considerably depending on the sort key chosen. Today this logic is implemented using scattered `switch` statements and other conditionals but it'd be clearer if we delegated this stuff to the sort key instances themselves. This commit moves the sort key enum to the top level and replaces one of the `switch` statements with a method on the enum instances.	2024-03-06 15:27:57 +00:00
David Turner	7cbdb6cc19	Drop dead code from get-snapshots request & response (#105608 ) Removes all the now-dead code related to reading pre-7.16 get-snapshots requests and responses, and also moves the `XContent` response parsing out of production and into the only test suite that uses it.	2024-02-21 07:57:50 +00:00
Ryan Ernst	b67f5a6b57	Make cluster feature predicate available to plugins (#105022 ) A predicate to check whether the cluster supports a feature is available to rest handlers defined in server. This commit adds that predicate to plugins defining rest handlers as well.	2024-02-01 09:11:18 -08:00
Simon Cooper	016c778321	Remove NamedWriteableRegistry from NodeClient, pass it directly through to rest actions (#103277 )	2024-01-11 12:42:22 +00:00
Lee Hinman	d297d79927	Fix `require_alias` implicit true value on presence (#104099 ) * Fix `require_alias` implicit true value on presence This commit brings the `require_alias` query-string parameter into line with the rest of our parameters where its presence indicates an implicit "true" value (so a user can do `POST /_bulk?require_alias` to enable the check). Resolves #103945 * Update docs/changelog/104099.yaml	2024-01-09 10:08:41 -07:00

1 2 3 4 5 ...

270 commits