elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-22 06:07:55 -04:00

Author	SHA1	Message	Date
Oleksandr Kolomiiets	987e9f7595	Unmute testCreateAndRestoreSearchableSnapshot and add more context to assert (#127131 ) See #119709.	2025-04-22 09:09:32 +10:00
Armin Braun	c662590b6d	Make DelayableWriteable compress its contents (#126988 ) In light of data from recent escalations and the introduction of batched execution we can make two improvements to this logic. For one, we should prefix with a fixed length length field so that we don't need to do any copying when serializing to account for the vint. This outright halves the memory bandwidth required relative to the previous implementation. More importantly maybe, we should compress these bytes. The wire-format for aggregations is rather inefficient when working with nested bucket aggregations since the type strings are repeated over and over. These don't contribute to the peak heap requirements because they are translated into Java types, but blow up the message size considerably (among other things). Practically, it seems that we often get compression ratios of ~10x for aggregations. Given that we generally have more memory issues than CPU issues during the reduce-step it seems like an easy tradeoff to trade a little CPU for compression for serious heap savings here.	2025-04-19 19:53:46 +02:00
Kathleen DeRusso	e280aa5d50	Revert semantic_text model registry changes (#127075 )	2025-04-18 18:36:33 -04:00
James Baiera	7b89f4d4a6	Add ability to redirect ingestion failures on data streams to a failure store (#126973 ) Removes the feature flags and guards that prevent the new failure store functionality from operating in production runtimes.	2025-04-18 16:33:03 -04:00
Armin Braun	f461f90d48	Remove redundant marker interfaces that extend Bucket (#127038 ) No need to have these marker interfaces around when weäre not using them anywhere, all they do is hide a lot of code duplication actually. Removing them sets up the possible removal of hundreds of lines of downstream code it seems	2025-04-18 18:26:39 +02:00
Niels Bauman	a81c4491f0	Fix timeout for awaiting index existence (#126773 ) #126692 allowed consumers to specify a timeout to `awaitIndexExists`, but that timeout did not get propagated correctly to all the required places.	2025-04-18 11:27:52 +02:00
Oleksandr Kolomiiets	62c0629da6	Add new-style block loader tests for constant_keyword, version, wildcard (#126968 )	2025-04-17 13:22:09 -07:00
Niels Bauman	16070a342f	Fix tests in `TimeSeriesDataStreamsIT` (#126851 ) These tests had the potential to fail when subsequent requests would hit different nodes with different versions of the cluster state. Only one of these tests failed already, but we fix the other ones proactively to avoid future failures. Fixes #126746	2025-04-17 16:35:43 +02:00
Kathleen DeRusso	a72883e8e3	Default new semantic_text fields to use BBQ when models are compatible (#126629 ) * Default new semantic_text fields to use BBQ when models are compatible * Update docs/changelog/126629.yaml * Gate default BBQ by IndexVersion * Cleanup from PR feedback * PR feedback * Fix test * Fix test * PR feedback * Update test to test correct options * Hack alert: Fix issue where mapper service was always being created with current index version	2025-04-17 08:25:10 -04:00
Nik Everett	128144dd6d	ESQL: Add `documents_found` and `values_loaded` (#125631 ) This adds `documents_found` and `values_loaded` to the to the ESQL response: ```json { "took" : 194, "is_partial" : false, "documents_found" : 100000, "values_loaded" : 200000, "columns" : [ { "name" : "a", "type" : "long" }, { "name" : "b", "type" : "long" } ], "values" : [[10, 1]] } ``` These are cheap enough to collect that we can do it for every query and return it with every response. It's small, but it still gives you a reasonable sense of how much work Elasticsearch had to go through to perform the query. I've also added these two fields to the driver profile and task status: ```json "drivers" : [ { "description" : "data", "cluster_name" : "runTask", "node_name" : "runTask-0", "start_millis" : 1742923173077, "stop_millis" : 1742923173087, "took_nanos" : 9557014, "cpu_nanos" : 9091340, "documents_found" : 5, <---- THESE "values_loaded" : 15, <---- THESE "iterations" : 6, ... ``` These are at a high level and should be easy to reason about. We'd like to extract this into a "show me how difficult this running query is" API one day. But today, just plumbing it into the debugging output is good. Any `Operator` can claim to "find documents" or "load values" by overriding a method on its `Operator.Status` implementation: ```java /** * The number of documents found by this operator. Most operators * don't find documents and will return {@code 0} here. / default long documentsFound() { return 0; } /* * The number of values loaded by this operator. Most operators * don't load values and will return {@code 0} here. / default long valuesLoaded() { return 0; } ``` In this PR all of the `LuceneOperator`s declare that each `position` they emit is a "document found" and the `ValuesSourceValuesSourceReaderOperator` says each value it makes is a "value loaded". That's pretty pretty much true. The `LuceneCountOperator` and `LuceneMinMaxOperator` sort of pretend that the count/min/max that they emit is a "document" - but that's good enough to give you a sense of what's going on. It's like* document.	2025-04-16 17:15:25 +02:00
Andrei Dan	e74c237059	Enable online prewarming SPI in integration tests (#126777 ) Integration tests use the MockNode. This adds the SPI lookup when building the MockSearchService. This will enable us to have the online prewarming implementation avilable in ESIntegTestCase.	2025-04-16 14:01:36 +01:00
Nik Everett	2e437577d8	ESQL: Create fewer documents in lookup tests (#126874 ) This lowers the number of documents used to test lookup because we have a few failures over the last few months. These are all cases that we expect to pass so fewer documents should make them even more likely to pass. Closes #125913 Closes #125779	2025-04-16 05:56:47 +10:00
Mikhail Berezovskiy	5a7a425bd0	Refactor GCS fixture multipart parser (#125828 )	2025-04-15 10:09:53 -07:00
David Turner	aa40147142	Add integ tests for `ftp://` URL repository (#126757 ) We document support for snapshot repositories using `ftp://` URLs but it seems this functionality has not worked for many years because of security-manager restrictions, although nobody noticed because it was not covered by any tests. The migration to the Entitlements framework means that this functionality now works again, so this commit adds tests to make sure we do not break it again in future.	2025-04-15 12:57:00 +01:00
Ievgen Degtiarenko	07cb14e7a9	Expose more detailed profiling information (#126525 )	2025-04-15 12:27:31 +02:00
Nick Tindall	358b724bd8	Deduplicate monitoring of balancer settings (#126752 )	2025-04-15 16:58:27 +10:00
Jim Ferenczi	46c3657255	Fix and unmute SemanticInferenceMetadataFieldsRecoveryTests (#126784 ) Use the TranslogOperationAsserter to compare the raw operations. Closes #124383 Closes #124384 Closes #124385	2025-04-15 08:36:20 +02:00
Ryan Ernst	83ce15ae06	Make TransportRequest an interface (#126733 ) In order to support a future TransportRequest variant that accepts the response type, TransportRequest needs to be an interface. This commit adds AbstractTransportRequest as a concrete implementation and makes TransportRequest a simple interface that joints together the parent interfaces from TransportMessage. Note that this was done entirely in Intellij using structural find and replace.	2025-04-14 14:22:28 -07:00
Mary Gouseti	e461717627	Test fix: align timeouts in `testDataStreamLifecycleDownsampleRollingRestart` (#123769 ) (#126682 ) Recently we changed the implementation of `testDataStreamLifecycleDownsampleRollingRestart` to use a temporary state listener. We missed that the listener also had a timeout that was quite shorter than the `safeGet` timeout we were configuring. In this PR we align these two timeouts. Fixes: #123769	2025-04-15 02:53:59 +10:00
Oleksandr Kolomiiets	9d18d5280a	Add block loader from stored field and source for ip field (#126644 )	2025-04-11 13:37:15 -07:00
Andrei Dan	fa09255182	Online prewarming service interface docs and usage in SearchService (#126561 ) This adds the interface for search online prewarming with a default NOOP implementation. This also hooks the interface in the SearchService after we fork the query phase to the search thread pool.	2025-04-11 17:53:50 +01:00
David Turner	800cf72e1f	Use `TimeValue` for timeouts in `safeAwait` etc. (#126509 ) There's no need to force callers to deconstruct the `TimeValue` in their possession into a `long` and a `TimeUnit`, we can do it ourselves.	2025-04-12 02:46:28 +10:00
Niels Bauman	507f40cd72	Fix `ILMDownsampleDisruptionIT.testILMDownsampleRollingRestart` (#126692 ) Wait for the index to exist on the master node to ensure all nodes have the latest cluster state. Fixes #126495	2025-04-11 17:45:45 +02:00
Lorenzo Dematté	e4af657c12	Patcher improvements (HDFS) (#126449 ) Patchers transform specific classes in some "broken" dependencies to ensure they behave correctly (fixing a bug, disabling some undesired or dangerous behaviour, updating calls to deprecated or removed method overloads). If we upgrade one of the dependencies we patch, we have a concerns that the patchers may not work against the classes in the new version. This PR addresses this concern by introducing a check on the SHA256 digest of the class, to ensure we are operating on the same bytes the patcher was designed for; if the digest changes that means the class has been changed (e.g. for a dependency update). If that happens, we break the build process with a specific error, so we can double check that the patchers still work against the new classes. Extracted from #126326 Relates to ES-11279	2025-04-11 17:20:45 +02:00
Martijn van Groningen	6012590929	Improve resiliency of UpdateTimeSeriesRangeService (#126637 ) If updating the `index.time_series.end_time` fails for one data stream, then UpdateTimeSeriesRangeService should continue updating this setting for other data streams. The following error was observed in the wild: ``` [2025-04-07T08:50:39,698][WARN ][o.e.d.UpdateTimeSeriesRangeService] [node-01] failed to update tsdb data stream end times java.lang.IllegalArgumentException: [index.time_series.end_time] requires [index.mode=time_series] at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:636) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:619) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.common.settings.Setting.get(Setting.java:563) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.common.settings.Setting.get(Setting.java:535) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService.updateTimeSeriesTemporalRange(UpdateTimeSeriesRangeService.java:111) ~[?:?] at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService$UpdateTimeSeriesExecutor.execute(UpdateTimeSeriesRangeService.java:210) ~[?:?] at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.17.3.jar:?] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.17.3.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] at java.lang.Thread.run(Thread.java:1575) ~[?:?] ``` Which resulted in a situation, that causes the `index.time_series.end_time` index setting not being updated for any data stream. This then caused data loss as metrics couldn't be indexed, because no suitable backing index could be resolved: ``` the document timestamp [2025-03-26T15:26:10.000Z] is outside of ranges of currently writable indices [[2025-01-31T07:22:43.000Z,2025-02-15T07:24:06.000Z][2025-02-15T07:24:06.000Z,2025-03-02T07:34:07.000Z][2025-03-02T07:34:07.000Z,2025-03-10T12:45:37.000Z][2025-03-10T12:45:37.000Z,2025-03-10T14:30:37.000Z][2025-03-10T14:30:37.000Z,2025-03-25T12:50:40.000Z][2025-03-25T12:50:40.000Z,2025-03-25T14:35:40.000Z ```	2025-04-11 12:58:10 +02:00
David Turner	b10b35fccd	Fix `S3RepositoryAnalysisRestIT` (#126593 ) - Translate a 404 during a multipart copy into a `FileNotFoundException` - Use multiple threads in `S3HttpHandler` to avoid `CopyObject`/`PutObject` deadlock Closes #126576	2025-04-11 05:41:20 +10:00
Tanguy Leroux	591fa87e43	Revive read/write engine lock to guard operations against resets (#126311 ) This change re-introduces the engine read/write lock to guard against engine resets. It differs from #124635 on the following: uses the engineMutex for creating/closing engines uses the reentrant r/w lock for retaining engine instances and for resetting the engine acquires the reentrant read lock during refreshes to prevent deadlocks during resets add tests to ensure no deadlock when re-acquiring read lock in refresh listeners Relates ES-11447	2025-04-10 13:37:48 +02:00
David Turner	9e0d885702	Reduce `assertBusy` usage in `testMultipleNodes` (#126582 ) Relates #126501	2025-04-10 18:28:36 +10:00
Yang Wang	62636f958b	Replace assertBusy of indexExists (#126501 ) Relates: https://github.com/elastic/elasticsearch/pull/126437#pullrequestreview-2748766613	2025-04-10 10:56:52 +10:00
Brendan Cully	c1a71ff45c	BlobContainer: add copyBlob method (#125737 ) * BlobContainer: add copyBlob method If a container implements copyBlob, then the copy is performed by the store, without client-side IO. If the store does not provide a copy operation then the default implementation throws UnsupportedOperationException. This change provides implementations for the FS and S3 blob containers. More will follow. Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: David Turner <david.turner@elastic.co>	2025-04-09 10:33:01 -07:00
Ryan Ernst	3bac50e818	Use logs dir as working directory (#124966 ) In the unexpected case that Elasticsearch dies due to a segfault or other similar native issue, a core dump is useful in diagnosing the problem. Yet core dumps are written to the working directory, which is read-only for most installations of Elasticsearch. This commit changes the working directory to the logs dir which should always be writeable.	2025-04-09 07:07:11 -07:00
Gal Lalouche	953b9fbb83	ESQL: List/get query API (#124832 ) This PR adds two new REST endpoints, for listing queries and getting information on a current query. * Resolves #124827 * Related to #124828 (initial work) Changes from the API specified in the above issues: * The get API is pretty initial, as we don't have a way of fetching the memory used or number of rows processed. List queries response: ``` GET /_query/queries // returns for each of the running queries // query_id, start_time, running_time, query { "queries" : { "abc": { "id": "abc", "start_time_millis": 14585858875292, "running_time_nanos": 762794, "query": "FROM logs* \| STATS BY hostname" }, "4321": { "id":"4321", "start_time_millis": 14585858823573, "running_time_nanos": 90231, "query": "FROM orders \| LOOKUP country_code ON country" } } } ``` Get query response: ``` GET /_query/queries/abc { "id" : "abc", "start_time_millis": 14585858875292, "running_time_nanos": 762794, "query": "FROM logs* \| STATS BY hostname" "coordinating_node": "oTUltX4IQMOUUVeiohTt8A" "data_nodes" : [ "DwrYwfytxthse49X4", "i5msnbUyWlpe86e7"] } ```	2025-04-08 22:21:32 +03:00
David Turner	aab40b1247	Introduce `TestBlobContainerBuilder` (#126445 ) The mostly-optional parameters to `createBlobContainer` are getting rather numerous in this test harness which makes the tests hard to read. This commit introduces a builder to help name the provided parameters and skip the omitted ones.	2025-04-09 01:52:16 +10:00
Dianna Hohensee	4b2867a0ef	Support maxConnections override in AbstractBlobContainerRetriesTestCase tests (#126435 )	2025-04-08 09:55:01 -04:00
Ryan Ernst	991e80d56e	Remove unnecessary generic params from action classes (#126364 ) Transport actions have associated request and response classes. However, the base type restrictions are not necessary to duplicate when creating a map of transport actions. Relatedly, the ActionHandler class doesn't actually need strongly typed action type and classes since they are lost when shoved into the node client map. This commit removes these type restrictions and generic parameters.	2025-04-07 16:22:56 -07:00
David Turner	cedcb5ccfe	Replace `TransportResponse.Empty` with `ActionResponse.Empty` (#126400 ) No need to distinguish these things any more, we can just use `ActionResponse.Empty` everywhere.	2025-04-08 06:58:06 +10:00
David Turner	f6c1965101	Forward port changes from backport of #125562 (#126413 ) The backport to `8.x` needed some changes to pass through CI; this commit forward-ports the relevant bits of those changes back into `main` to keep the branches aligned.	2025-04-07 19:05:06 +01:00
David Turner	fbbbdd7eec	Allow overriding blob container path in tests (#126391 ) Some `AbstractBlobContainerRetriesTestCase#createBlobContainer` implementations choose a path for the container randomly, but we have a need for a test which re-creates the same container against a different `S3Service` and `BlobStore` and must therefore specify the same path each time. This commit exposes a parameter that lets callers specify a container path.	2025-04-08 03:54:37 +10:00
Oleksandr Kolomiiets	21ff72bef4	Use FallbackSyntheticSourceBlockLoader for text fields (#126237 )	2025-04-07 09:32:35 -07:00
David Turner	527d2a203b	Improve handling of empty response (#125562 ) Today `ActionResponse$Empty` implements `ToXContentObject`, but yields no bytes of content when serialized which creates an invalid JSON response. This commit removes the bogus interface and adjusts the affected REST APIs to send a `text/plain` response instead.	2025-04-07 12:10:07 +01:00
Jordan Powers	4c174a891f	Use Lucene101 postings format by default (#126080 ) Update the PerFieldFormatSupplier so that new standard indices use the Lucene101PostingsFormat instead of the current default ES812PostingsFormat. Currently, use of the new codec is gated behind a feature flag.	2025-04-04 12:41:27 -07:00
David Turner	7239540c91	Replace `region` with `regionSupplier` in all AWS tests (#126285 ) Rather than hard-coding a region name we should always auto-generate it randomly during test execution. This commit replaces the remaining fixed `String` arguments with a `Supplier<String>` argument to enable this.	2025-04-05 02:27:28 +11:00
Alexander Spies	8f38b13059	ESQL: Revert "Allow partial results by default in ES\|QL (#125060 )" (#126286 ) This reverts commit `81555cc9d9` from https://github.com/elastic/elasticsearch/pull/125060. Fix https://github.com/elastic/elasticsearch/issues/126275 @idegtiarenko and I investigated and believe this needs reverting: silently dropping results from the query response in case any index is missing can lead to real problems if users don't spot their mistake. I'm also not sure if all the results will get dropped, or only from some nodes/shards/clusters, meaning that this might be hard to spot by users if only some results get dropped. The main PR has no transport version bump, no new ESQL capability, and was merged 15h ago - so it should be safe to just revert it. I noticed there was a linked Serverless PR on the original PR, but it merely disabled some obsolete tests on Serverless and doesn't require reverting itself.	2025-04-05 01:27:13 +11:00
David Turner	896598570c	Reinstate `S3SearchableSnapshotsCredentialsReloadIT` in FIPS JVMs (#126109 ) These tests only don't work in a FIPS JVM because they use a secret key that is unacceptably short. This commit replaces the relevant uses of `randomIdentifier` with `randomSecretKey` so they work whether in FIPS mode or not.	2025-04-04 18:42:09 +11:00
David Turner	24fa8eaa63	Support `ListObjectsV2` in `S3HttpHandler` (#126189 ) `ListObjects` and `ListObjectsV2` only really differ in their approach to pagination, but today `S3HttpHandler` does not simulate pagination anyway so we can use the same handling code for both APIs. The only practical difference is that the v2 SDK requires the `<IsTruncated>` element in a `ListObjectsV2` response, but this element is permitted in both APIs so we add it here.	2025-04-04 18:40:27 +11:00
Nhat Nguyen	81555cc9d9	Allow partial results by default in ES\|QL (#125060 ) With this change, ES\|QL will return partial results instead of failing the entire query when encountering errors. Callers should check the partial_results flag in the response to determine if the result is partial or complete. If returning partial results is not desired, this option can be overridden per request via the allow_partial_results parameter in the query URL or globally via the cluster setting esql.allow_partial_results. Relates #122802	2025-04-03 12:30:47 -07:00
Ben Chaplin	9f6eb1d4e3	Log stack traces on data nodes before they are cleared for transport (#125732 ) We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false.	2025-04-03 13:45:09 -04:00
Mary Gouseti	488951edf3	Data stream lifecycle does not record error in failure store rollover (#126229 ) Issue The data stream lifecycle does not register correctly rollover errors for failure store. Observed bahaviour When data stream lifecycle encounters a rollover error it records it unless it sees that the current write index of this data stream doesn't match the source index of the request. However, the write index check does not use the failure write index but the write backing index, so the failure gets ignored Desired behaviour When data stream lifecycle encounters a rollover error it will check the relevant write index before it determines if it should be recorded or not.	2025-04-04 03:44:09 +11:00
David Turner	9b353f69a7	Fix `CommonPrefixes` rendering in `S3HttpHandler` (#126147 ) Today the `ListObjects` implementation in `S3HttpHandler` will put all the common prefixes in a single `<CommonPrefixes>` container, but in fact the real S3 gives each one its own container. The v1 SDK is lenient and accepts either, but the v2 SDK requires us to do this correctly. This commit fixes the test fixture to match the behaviour of the real S3.	2025-04-03 07:55:07 +01:00
Nick Tindall	58c8f4abae	Upgrade to latest GCS SDK (#126087 ) Upgrades google cloud SDK used by repository-gcs to com.google.cloud:google-cloud-storage-bom:2.50.0 Closes: ES-9287	2025-04-02 15:41:50 +11:00

1 2 3 4 5 ...

5763 commits