Commit graph

5752 commits

Author SHA1 Message Date
Nik Everett
2e437577d8
ESQL: Create fewer documents in lookup tests (#126874)
This lowers the number of documents used to test lookup because we have
a few failures over the last few months. These are all cases that we
expect to *pass* so fewer documents should make them even more likely to
pass.

Closes #125913 Closes #125779
2025-04-16 05:56:47 +10:00
Mikhail Berezovskiy
5a7a425bd0
Refactor GCS fixture multipart parser (#125828) 2025-04-15 10:09:53 -07:00
David Turner
aa40147142
Add integ tests for ftp:// URL repository (#126757)
We document support for snapshot repositories using `ftp://` URLs but it
seems this functionality has not worked for many years because of
security-manager restrictions, although nobody noticed because it was
not covered by any tests. The migration to the Entitlements framework
means that this functionality now works again, so this commit adds tests
to make sure we do not break it again in future.
2025-04-15 12:57:00 +01:00
Ievgen Degtiarenko
07cb14e7a9
Expose more detailed profiling information (#126525) 2025-04-15 12:27:31 +02:00
Nick Tindall
358b724bd8
Deduplicate monitoring of balancer settings (#126752) 2025-04-15 16:58:27 +10:00
Jim Ferenczi
46c3657255
Fix and unmute SemanticInferenceMetadataFieldsRecoveryTests (#126784)
Use the TranslogOperationAsserter to compare the raw operations.

Closes #124383
Closes #124384
Closes #124385
2025-04-15 08:36:20 +02:00
Ryan Ernst
83ce15ae06
Make TransportRequest an interface (#126733)
In order to support a future TransportRequest variant that accepts the
response type, TransportRequest needs to be an interface. This commit
adds AbstractTransportRequest as a concrete implementation and makes
TransportRequest a simple interface that joints together the parent
interfaces from TransportMessage.

Note that this was done entirely in Intellij using structural find and
replace.
2025-04-14 14:22:28 -07:00
Mary Gouseti
e461717627
Test fix: align timeouts in testDataStreamLifecycleDownsampleRollingRestart (#123769) (#126682)
Recently we changed the implementation of
`testDataStreamLifecycleDownsampleRollingRestart` to use a temporary
state listener. We missed that the listener also had a timeout that was
quite shorter than the `safeGet` timeout we were configuring. In this PR
we align these two timeouts.

Fixes: #123769
2025-04-15 02:53:59 +10:00
Oleksandr Kolomiiets
9d18d5280a
Add block loader from stored field and source for ip field (#126644) 2025-04-11 13:37:15 -07:00
Andrei Dan
fa09255182
Online prewarming service interface docs and usage in SearchService (#126561)
This adds the interface for search online prewarming with a default NOOP
implementation. This also hooks the interface in the SearchService after
we fork the query phase to the search thread pool.
2025-04-11 17:53:50 +01:00
David Turner
800cf72e1f
Use TimeValue for timeouts in safeAwait etc. (#126509)
There's no need to force callers to deconstruct the `TimeValue` in their
possession into a `long` and a `TimeUnit`, we can do it ourselves.
2025-04-12 02:46:28 +10:00
Niels Bauman
507f40cd72
Fix ILMDownsampleDisruptionIT.testILMDownsampleRollingRestart (#126692)
Wait for the index to exist on the master node to ensure all nodes have
the latest cluster state.

Fixes #126495
2025-04-11 17:45:45 +02:00
Lorenzo Dematté
e4af657c12
Patcher improvements (HDFS) (#126449)
Patchers transform specific classes in some "broken" dependencies to ensure they behave correctly (fixing a bug, disabling some undesired or dangerous behaviour, updating calls to deprecated or removed method overloads).

If we upgrade one of the dependencies we patch, we have a concerns that the patchers may not work against the classes in the new version.
This PR addresses this concern by introducing a check on the SHA256 digest of the class, to ensure we are operating on the same bytes the patcher was designed for; if the digest changes that means the class has been changed (e.g. for a dependency update). If that happens, we break the build process with a specific error, so we can double check that the patchers still work against the new classes.

Extracted from #126326

Relates to ES-11279
2025-04-11 17:20:45 +02:00
Martijn van Groningen
6012590929
Improve resiliency of UpdateTimeSeriesRangeService (#126637)
If updating the `index.time_series.end_time` fails for one data stream,
then UpdateTimeSeriesRangeService should continue updating this setting for other data streams.

The following error was observed in the wild:

```
[2025-04-07T08:50:39,698][WARN ][o.e.d.UpdateTimeSeriesRangeService] [node-01] failed to update tsdb data stream end times
java.lang.IllegalArgumentException: [index.time_series.end_time] requires [index.mode=time_series]
        at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:636) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:619) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.settings.Setting.get(Setting.java:563) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.settings.Setting.get(Setting.java:535) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService.updateTimeSeriesTemporalRange(UpdateTimeSeriesRangeService.java:111) ~[?:?]
        at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService$UpdateTimeSeriesExecutor.execute(UpdateTimeSeriesRangeService.java:210) ~[?:?]
        at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.17.3.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) ~[?:?]
```

Which resulted in a situation, that causes the `index.time_series.end_time` index setting not being updated for any data stream. This then caused data loss as metrics couldn't be indexed, because no suitable backing index could be resolved:

```
the document timestamp [2025-03-26T15:26:10.000Z] is outside of ranges of currently writable indices [[2025-01-31T07:22:43.000Z,2025-02-15T07:24:06.000Z][2025-02-15T07:24:06.000Z,2025-03-02T07:34:07.000Z][2025-03-02T07:34:07.000Z,2025-03-10T12:45:37.000Z][2025-03-10T12:45:37.000Z,2025-03-10T14:30:37.000Z][2025-03-10T14:30:37.000Z,2025-03-25T12:50:40.000Z][2025-03-25T12:50:40.000Z,2025-03-25T14:35:40.000Z
```
2025-04-11 12:58:10 +02:00
David Turner
b10b35fccd
Fix S3RepositoryAnalysisRestIT (#126593)
- Translate a 404 during a multipart copy into a `FileNotFoundException`

- Use multiple threads in `S3HttpHandler` to avoid `CopyObject`/`PutObject` deadlock

Closes #126576
2025-04-11 05:41:20 +10:00
Tanguy Leroux
591fa87e43
Revive read/write engine lock to guard operations against resets (#126311)
This change re-introduces the engine read/write lock to guard against engine resets.

It differs from #124635 on the following:
    uses the engineMutex for creating/closing engines
    uses the reentrant r/w lock for retaining engine instances and for resetting the engine
    acquires the reentrant read lock during refreshes to prevent deadlocks during resets
    add tests to ensure no deadlock when re-acquiring read lock in refresh listeners

Relates ES-11447
2025-04-10 13:37:48 +02:00
David Turner
9e0d885702
Reduce assertBusy usage in testMultipleNodes (#126582)
Relates #126501
2025-04-10 18:28:36 +10:00
Yang Wang
62636f958b
Replace assertBusy of indexExists (#126501)
Relates:
https://github.com/elastic/elasticsearch/pull/126437#pullrequestreview-2748766613
2025-04-10 10:56:52 +10:00
Brendan Cully
c1a71ff45c
BlobContainer: add copyBlob method (#125737)
* BlobContainer: add copyBlob method

If a container implements copyBlob, then the copy is
performed by the store, without client-side IO. If the store
does not provide a copy operation then the default implementation
throws UnsupportedOperationException.

This change provides implementations for the FS and S3 blob containers.
More will follow.

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: David Turner <david.turner@elastic.co>
2025-04-09 10:33:01 -07:00
Ryan Ernst
3bac50e818
Use logs dir as working directory (#124966)
In the unexpected case that Elasticsearch dies due to a segfault or
other similar native issue, a core dump is useful in diagnosing the
problem. Yet core dumps are written to the working directory, which is
read-only for most installations of Elasticsearch. This commit changes
the working directory to the logs dir which should always be writeable.
2025-04-09 07:07:11 -07:00
Gal Lalouche
953b9fbb83
ESQL: List/get query API (#124832)
This PR adds two new REST endpoints, for listing queries and getting information on a current query.

* Resolves #124827 
* Related to #124828 (initial work)

Changes from the API specified in the above issues:
* The get API is pretty initial, as we don't have a way of fetching the memory used or number of rows processed.

List queries response:
```
GET /_query/queries
// returns for each of the running queries
// query_id, start_time, running_time, query

{ "queries" : {
 "abc": {
  "id": "abc",
  "start_time_millis": 14585858875292,
  "running_time_nanos": 762794,
  "query": "FROM logs* | STATS BY hostname"
  },
 "4321": {
  "id":"4321",
  "start_time_millis": 14585858823573,
  "running_time_nanos": 90231,
  "query": "FROM orders | LOOKUP country_code ON country"
  }
 } 
}
```

Get query response:
```
GET /_query/queries/abc

{
 "id" : "abc",
  "start_time_millis": 14585858875292,
  "running_time_nanos": 762794,
  "query": "FROM logs* | STATS BY hostname"
  "coordinating_node": "oTUltX4IQMOUUVeiohTt8A"
  "data_nodes" : [ "DwrYwfytxthse49X4", "i5msnbUyWlpe86e7"]
}
```
2025-04-08 22:21:32 +03:00
David Turner
aab40b1247
Introduce TestBlobContainerBuilder (#126445)
The mostly-optional parameters to `createBlobContainer` are getting
rather numerous in this test harness which makes the tests hard to read.
This commit introduces a builder to help name the provided parameters
and skip the omitted ones.
2025-04-09 01:52:16 +10:00
Dianna Hohensee
4b2867a0ef
Support maxConnections override in AbstractBlobContainerRetriesTestCase tests (#126435) 2025-04-08 09:55:01 -04:00
Ryan Ernst
991e80d56e
Remove unnecessary generic params from action classes (#126364)
Transport actions have associated request and response classes. However,
the base type restrictions are not necessary to duplicate when creating
a map of transport actions. Relatedly, the ActionHandler class doesn't
actually need strongly typed action type and classes since they are lost
when shoved into the node client map. This commit removes these type
restrictions and generic parameters.
2025-04-07 16:22:56 -07:00
David Turner
cedcb5ccfe
Replace TransportResponse.Empty with ActionResponse.Empty (#126400)
No need to distinguish these things any more, we can just use
`ActionResponse.Empty` everywhere.
2025-04-08 06:58:06 +10:00
David Turner
f6c1965101
Forward port changes from backport of #125562 (#126413)
The backport to `8.x` needed some changes to pass through CI; this
commit forward-ports the relevant bits of those changes back into `main`
to keep the branches aligned.
2025-04-07 19:05:06 +01:00
David Turner
fbbbdd7eec
Allow overriding blob container path in tests (#126391)
Some `AbstractBlobContainerRetriesTestCase#createBlobContainer`
implementations choose a path for the container randomly, but we have a
need for a test which re-creates the same container against a different
`S3Service` and `BlobStore` and must therefore specify the same path
each time. This commit exposes a parameter that lets callers specify a
container path.
2025-04-08 03:54:37 +10:00
Oleksandr Kolomiiets
21ff72bef4
Use FallbackSyntheticSourceBlockLoader for text fields (#126237) 2025-04-07 09:32:35 -07:00
David Turner
527d2a203b
Improve handling of empty response (#125562)
Today `ActionResponse$Empty` implements `ToXContentObject`, but yields
no bytes of content when serialized which creates an invalid JSON
response. This commit removes the bogus interface and adjusts the
affected REST APIs to send a `text/plain` response instead.
2025-04-07 12:10:07 +01:00
Jordan Powers
4c174a891f
Use Lucene101 postings format by default (#126080)
Update the PerFieldFormatSupplier so that new standard indices use the
Lucene101PostingsFormat instead of the current default ES812PostingsFormat.

Currently, use of the new codec is gated behind a feature flag.
2025-04-04 12:41:27 -07:00
David Turner
7239540c91
Replace region with regionSupplier in all AWS tests (#126285)
Rather than hard-coding a region name we should always auto-generate it
randomly during test execution. This commit replaces the remaining fixed
`String` arguments with a `Supplier<String>` argument to enable this.
2025-04-05 02:27:28 +11:00
Alexander Spies
8f38b13059
ESQL: Revert "Allow partial results by default in ES|QL (#125060)" (#126286)
This reverts commit 81555cc9d9 from
https://github.com/elastic/elasticsearch/pull/125060.

Fix https://github.com/elastic/elasticsearch/issues/126275

@idegtiarenko and I investigated and believe this needs reverting:
silently dropping results from the query response in case any index is
missing can lead to real problems if users don't spot their mistake. I'm
also not sure if all the results will get dropped, or only from some
nodes/shards/clusters, meaning that this might be hard to spot by users
if only some results get dropped.

The main PR has no transport version bump, no new ESQL capability, and
was merged 15h ago - so it should be safe to just revert it. I noticed
there was a linked Serverless PR on the original PR, but it merely
disabled some obsolete tests on Serverless and doesn't require reverting
itself.
2025-04-05 01:27:13 +11:00
David Turner
896598570c
Reinstate S3SearchableSnapshotsCredentialsReloadIT in FIPS JVMs (#126109)
These tests only don't work in a FIPS JVM because they use a secret key
that is unacceptably short. This commit replaces the relevant uses of
`randomIdentifier` with `randomSecretKey` so they work whether in FIPS
mode or not.
2025-04-04 18:42:09 +11:00
David Turner
24fa8eaa63
Support ListObjectsV2 in S3HttpHandler (#126189)
`ListObjects` and `ListObjectsV2` only really differ in their approach
to pagination, but today `S3HttpHandler` does not simulate pagination
anyway so we can use the same handling code for both APIs. The only
practical difference is that the v2 SDK requires the `<IsTruncated>`
element in a `ListObjectsV2` response, but this element is permitted in
both APIs so we add it here.
2025-04-04 18:40:27 +11:00
Nhat Nguyen
81555cc9d9
Allow partial results by default in ES|QL (#125060)
With this change, ES|QL will return partial results instead of failing
the entire query when encountering errors. Callers should check the
partial_results flag in the response to determine if the result is
partial or complete. If returning partial results is not desired, this
option can be overridden per request via the allow_partial_results
parameter in the query URL or globally via the cluster setting
esql.allow_partial_results.

Relates #122802
2025-04-03 12:30:47 -07:00
Ben Chaplin
9f6eb1d4e3
Log stack traces on data nodes before they are cleared for transport (#125732)
We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false.
2025-04-03 13:45:09 -04:00
Mary Gouseti
488951edf3
Data stream lifecycle does not record error in failure store rollover (#126229)
**Issue** The data stream lifecycle does not register correctly rollover
errors for failure store.

**Observed bahaviour** When data stream lifecycle encounters a rollover
error it records it unless it sees that the current write index of this
data stream doesn't match the source index of the request. However, the
write index check does not use the failure write index but the write
backing index, so the failure gets ignored

**Desired behaviour** When data stream lifecycle encounters a rollover
error it will check the relevant write index before it determines if it
should be recorded or not.
2025-04-04 03:44:09 +11:00
David Turner
9b353f69a7
Fix CommonPrefixes rendering in S3HttpHandler (#126147)
Today the `ListObjects` implementation in `S3HttpHandler` will put all
the common prefixes in a single `<CommonPrefixes>` container, but in
fact the real S3 gives each one its own container. The v1 SDK is lenient
and accepts either, but the v2 SDK requires us to do this correctly.
This commit fixes the test fixture to match the behaviour of the real
S3.
2025-04-03 07:55:07 +01:00
Nick Tindall
58c8f4abae
Upgrade to latest GCS SDK (#126087)
Upgrades google cloud SDK used by repository-gcs to com.google.cloud:google-cloud-storage-bom:2.50.0

Closes: ES-9287
2025-04-02 15:41:50 +11:00
Nick Tindall
28dd8e1bae
Make GCS HttpHandler more compliant (#126007)
- Fixed bug where 416 was being erroneously returned for zero-length blobs even with no Range header
- Fixed bug where partial upload wouldn't be completed if the last PUT included no data
- Return 206 (partial content) status when a Range header is specified
- Return an ETag on object get - BlobReadChannel uses this to ensure we fail when the blob is updated between successive chunks being fetched)
- The 416 on zero-length blobs was one of(?) the causes of #125668
2025-04-02 13:05:23 +11:00
Oleksandr Kolomiiets
f3ccde6959
Use FallbackSyntheticSourceBlockLoader for point and geo_point (#125816) 2025-04-01 12:55:18 -07:00
David Turner
0d64aab4cc
Clean up request parsing in S3HttpHandler (#126034)
The `METHOD /path/components?and=query` string representation of a
request is becoming increasingly difficult to parse, with slight
variations in parsing between the implementation in `S3HttpHandler` and
the various other implementations. This commit gets rid of the
string-concatenate-and-split behaviour in favour of a proper object that
has predicates for testing all the different kinds of request that might
be made against S3.
2025-04-02 05:49:50 +11:00
Jordan Powers
71e74bdd66
Store arrays offsets for scaled float fields natively with synthetic source (#125793)
This patch builds on the work in #113757, #122999, #124594, #125529, and 
#125709 to natively store array offsets for scaled float fields instead of
falling back to ignored source when synthetic_source_keep: arrays.
2025-03-28 20:26:29 +01:00
Mary Gouseti
1943844d5a
Effort to fix testDataStreamLifecycleDownsampleRollingRestart #123769 (#125478) 2025-03-28 15:26:09 +02:00
David Turner
2d4fb76267
Improve randomIdentifier usage in AWS tests (#125775)
Adds prefixes to various randomly-generated values to make it easier to
pin down where they're coming from in debugging sessions. Also forces
the STS expiry time to be rendered in UTC.
2025-03-28 18:33:05 +11:00
Yang Wang
3568ab8eac
Migrate RepositoriesMetadata to ProjectCustom (#125398)
This PR migrates RepositoriesMetadata from Metadata#ClusterCustom to
Metadata#ProjectCustom and handles wire BWC.

Resolves: ES-10477
2025-03-28 17:53:17 +11:00
Nick Tindall
a25677371a
Revert "Upgrade to latest GCS SDK (#124062)" (#125748)
This reverts commit 073ca0e888.
2025-03-28 17:49:30 +11:00
Rene Groeschke
b476ee19b5
Try fixing mutedTest.yml file not found (#125763) 2025-03-27 13:31:52 +01:00
David Turner
36c14bf3a5
Validate region/service in DynamicAwsCredentials (#125671)
Following on from #125559, we can validate the region and service name
in tests that use `DynamicAwsCredentials` too.
2025-03-27 06:14:40 +00:00
Mark Vieira
cb44d7a727
Bump test cluster startup timeout back to 5 minutes 2025-03-26 16:26:20 -07:00