Commit graph

5154 commits

Author SHA1 Message Date
Kathleen DeRusso
e280aa5d50
Revert semantic_text model registry changes (#127075) 2025-04-18 18:36:33 -04:00
James Baiera
7b89f4d4a6
Add ability to redirect ingestion failures on data streams to a failure store (#126973)
Removes the feature flags and guards that prevent the new failure store functionality 
from operating in production runtimes.
2025-04-18 16:33:03 -04:00
Armin Braun
f461f90d48
Remove redundant marker interfaces that extend Bucket (#127038)
No need to have these marker interfaces around when weäre not using them anywhere, all they do is hide a lot of code duplication actually. Removing them sets up the possible removal of hundreds of lines of downstream code it seems
2025-04-18 18:26:39 +02:00
Niels Bauman
a81c4491f0
Fix timeout for awaiting index existence (#126773)
#126692 allowed consumers to specify a timeout to `awaitIndexExists`,
but that timeout did not get propagated correctly to all the required
places.
2025-04-18 11:27:52 +02:00
Oleksandr Kolomiiets
62c0629da6
Add new-style block loader tests for constant_keyword, version, wildcard (#126968) 2025-04-17 13:22:09 -07:00
Niels Bauman
16070a342f
Fix tests in TimeSeriesDataStreamsIT (#126851)
These tests had the potential to fail when subsequent requests would hit
different nodes with different versions of the cluster state.

Only one of these tests failed already, but we fix the other ones
proactively to avoid future failures.

Fixes #126746
2025-04-17 16:35:43 +02:00
Kathleen DeRusso
a72883e8e3
Default new semantic_text fields to use BBQ when models are compatible (#126629)
* Default new semantic_text fields to use BBQ when models are compatible

* Update docs/changelog/126629.yaml

* Gate default BBQ by IndexVersion

* Cleanup from PR feedback

* PR feedback

* Fix test

* Fix test

* PR feedback

* Update test to test correct options

* Hack alert: Fix issue where mapper service was always being created with current index version
2025-04-17 08:25:10 -04:00
Nik Everett
128144dd6d
ESQL: Add documents_found and values_loaded (#125631)
This adds `documents_found` and `values_loaded` to the to the ESQL response:
```json
{
  "took" : 194,
  "is_partial" : false,
  "documents_found" : 100000,
  "values_loaded" : 200000,
  "columns" : [
    { "name" : "a", "type" : "long" },
    { "name" : "b", "type" : "long" }
  ],
  "values" : [[10, 1]]
}
```

These are cheap enough to collect that we can do it for every query and
return it with every response. It's small, but it still gives you a
reasonable sense of how much work Elasticsearch had to go through to
perform the query.

I've also added these two fields to the driver profile and task status:
```json
    "drivers" : [
      {
        "description" : "data",
        "cluster_name" : "runTask",
        "node_name" : "runTask-0",
        "start_millis" : 1742923173077,
        "stop_millis" : 1742923173087,
        "took_nanos" : 9557014,
        "cpu_nanos" : 9091340,
        "documents_found" : 5,   <---- THESE
        "values_loaded" : 15,    <---- THESE
        "iterations" : 6,
...
```

These are at a high level and should be easy to reason about. We'd like to
extract this into a "show me how difficult this running query is" API one
day. But today, just plumbing it into the debugging output is good.

Any `Operator` can claim to "find documents" or "load values" by overriding
a method on its `Operator.Status` implementation:
```java
/**
 * The number of documents found by this operator. Most operators
 * don't find documents and will return {@code 0} here.
 */
default long documentsFound() {
    return 0;
}

/**
 * The number of values loaded by this operator. Most operators
 * don't load values and will return {@code 0} here.
 */
default long valuesLoaded() {
    return 0;
}
```

In this PR all of the `LuceneOperator`s declare that each `position` they
emit is a "document found" and the `ValuesSourceValuesSourceReaderOperator`
says each value it makes is a "value loaded". That's pretty pretty much
true. The `LuceneCountOperator` and `LuceneMinMaxOperator` sort of pretend
that the count/min/max that they emit is a "document" - but that's good
enough to give you a sense of what's going on. It's *like* document.
2025-04-16 17:15:25 +02:00
Andrei Dan
e74c237059
Enable online prewarming SPI in integration tests (#126777)
Integration tests use the MockNode. This adds the SPI lookup
when building the MockSearchService. This will enable us to
have the online prewarming implementation avilable in
ESIntegTestCase.
2025-04-16 14:01:36 +01:00
Ievgen Degtiarenko
07cb14e7a9
Expose more detailed profiling information (#126525) 2025-04-15 12:27:31 +02:00
Nick Tindall
358b724bd8
Deduplicate monitoring of balancer settings (#126752) 2025-04-15 16:58:27 +10:00
Jim Ferenczi
46c3657255
Fix and unmute SemanticInferenceMetadataFieldsRecoveryTests (#126784)
Use the TranslogOperationAsserter to compare the raw operations.

Closes #124383
Closes #124384
Closes #124385
2025-04-15 08:36:20 +02:00
Ryan Ernst
83ce15ae06
Make TransportRequest an interface (#126733)
In order to support a future TransportRequest variant that accepts the
response type, TransportRequest needs to be an interface. This commit
adds AbstractTransportRequest as a concrete implementation and makes
TransportRequest a simple interface that joints together the parent
interfaces from TransportMessage.

Note that this was done entirely in Intellij using structural find and
replace.
2025-04-14 14:22:28 -07:00
Mary Gouseti
e461717627
Test fix: align timeouts in testDataStreamLifecycleDownsampleRollingRestart (#123769) (#126682)
Recently we changed the implementation of
`testDataStreamLifecycleDownsampleRollingRestart` to use a temporary
state listener. We missed that the listener also had a timeout that was
quite shorter than the `safeGet` timeout we were configuring. In this PR
we align these two timeouts.

Fixes: #123769
2025-04-15 02:53:59 +10:00
Oleksandr Kolomiiets
9d18d5280a
Add block loader from stored field and source for ip field (#126644) 2025-04-11 13:37:15 -07:00
Andrei Dan
fa09255182
Online prewarming service interface docs and usage in SearchService (#126561)
This adds the interface for search online prewarming with a default NOOP
implementation. This also hooks the interface in the SearchService after
we fork the query phase to the search thread pool.
2025-04-11 17:53:50 +01:00
David Turner
800cf72e1f
Use TimeValue for timeouts in safeAwait etc. (#126509)
There's no need to force callers to deconstruct the `TimeValue` in their
possession into a `long` and a `TimeUnit`, we can do it ourselves.
2025-04-12 02:46:28 +10:00
Niels Bauman
507f40cd72
Fix ILMDownsampleDisruptionIT.testILMDownsampleRollingRestart (#126692)
Wait for the index to exist on the master node to ensure all nodes have
the latest cluster state.

Fixes #126495
2025-04-11 17:45:45 +02:00
Martijn van Groningen
6012590929
Improve resiliency of UpdateTimeSeriesRangeService (#126637)
If updating the `index.time_series.end_time` fails for one data stream,
then UpdateTimeSeriesRangeService should continue updating this setting for other data streams.

The following error was observed in the wild:

```
[2025-04-07T08:50:39,698][WARN ][o.e.d.UpdateTimeSeriesRangeService] [node-01] failed to update tsdb data stream end times
java.lang.IllegalArgumentException: [index.time_series.end_time] requires [index.mode=time_series]
        at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:636) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:619) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.settings.Setting.get(Setting.java:563) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.settings.Setting.get(Setting.java:535) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService.updateTimeSeriesTemporalRange(UpdateTimeSeriesRangeService.java:111) ~[?:?]
        at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService$UpdateTimeSeriesExecutor.execute(UpdateTimeSeriesRangeService.java:210) ~[?:?]
        at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.17.3.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) ~[?:?]
```

Which resulted in a situation, that causes the `index.time_series.end_time` index setting not being updated for any data stream. This then caused data loss as metrics couldn't be indexed, because no suitable backing index could be resolved:

```
the document timestamp [2025-03-26T15:26:10.000Z] is outside of ranges of currently writable indices [[2025-01-31T07:22:43.000Z,2025-02-15T07:24:06.000Z][2025-02-15T07:24:06.000Z,2025-03-02T07:34:07.000Z][2025-03-02T07:34:07.000Z,2025-03-10T12:45:37.000Z][2025-03-10T12:45:37.000Z,2025-03-10T14:30:37.000Z][2025-03-10T14:30:37.000Z,2025-03-25T12:50:40.000Z][2025-03-25T12:50:40.000Z,2025-03-25T14:35:40.000Z
```
2025-04-11 12:58:10 +02:00
Tanguy Leroux
591fa87e43
Revive read/write engine lock to guard operations against resets (#126311)
This change re-introduces the engine read/write lock to guard against engine resets.

It differs from #124635 on the following:
    uses the engineMutex for creating/closing engines
    uses the reentrant r/w lock for retaining engine instances and for resetting the engine
    acquires the reentrant read lock during refreshes to prevent deadlocks during resets
    add tests to ensure no deadlock when re-acquiring read lock in refresh listeners

Relates ES-11447
2025-04-10 13:37:48 +02:00
David Turner
9e0d885702
Reduce assertBusy usage in testMultipleNodes (#126582)
Relates #126501
2025-04-10 18:28:36 +10:00
Yang Wang
62636f958b
Replace assertBusy of indexExists (#126501)
Relates:
https://github.com/elastic/elasticsearch/pull/126437#pullrequestreview-2748766613
2025-04-10 10:56:52 +10:00
Brendan Cully
c1a71ff45c
BlobContainer: add copyBlob method (#125737)
* BlobContainer: add copyBlob method

If a container implements copyBlob, then the copy is
performed by the store, without client-side IO. If the store
does not provide a copy operation then the default implementation
throws UnsupportedOperationException.

This change provides implementations for the FS and S3 blob containers.
More will follow.

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: David Turner <david.turner@elastic.co>
2025-04-09 10:33:01 -07:00
Gal Lalouche
953b9fbb83
ESQL: List/get query API (#124832)
This PR adds two new REST endpoints, for listing queries and getting information on a current query.

* Resolves #124827 
* Related to #124828 (initial work)

Changes from the API specified in the above issues:
* The get API is pretty initial, as we don't have a way of fetching the memory used or number of rows processed.

List queries response:
```
GET /_query/queries
// returns for each of the running queries
// query_id, start_time, running_time, query

{ "queries" : {
 "abc": {
  "id": "abc",
  "start_time_millis": 14585858875292,
  "running_time_nanos": 762794,
  "query": "FROM logs* | STATS BY hostname"
  },
 "4321": {
  "id":"4321",
  "start_time_millis": 14585858823573,
  "running_time_nanos": 90231,
  "query": "FROM orders | LOOKUP country_code ON country"
  }
 } 
}
```

Get query response:
```
GET /_query/queries/abc

{
 "id" : "abc",
  "start_time_millis": 14585858875292,
  "running_time_nanos": 762794,
  "query": "FROM logs* | STATS BY hostname"
  "coordinating_node": "oTUltX4IQMOUUVeiohTt8A"
  "data_nodes" : [ "DwrYwfytxthse49X4", "i5msnbUyWlpe86e7"]
}
```
2025-04-08 22:21:32 +03:00
David Turner
aab40b1247
Introduce TestBlobContainerBuilder (#126445)
The mostly-optional parameters to `createBlobContainer` are getting
rather numerous in this test harness which makes the tests hard to read.
This commit introduces a builder to help name the provided parameters
and skip the omitted ones.
2025-04-09 01:52:16 +10:00
Dianna Hohensee
4b2867a0ef
Support maxConnections override in AbstractBlobContainerRetriesTestCase tests (#126435) 2025-04-08 09:55:01 -04:00
Ryan Ernst
991e80d56e
Remove unnecessary generic params from action classes (#126364)
Transport actions have associated request and response classes. However,
the base type restrictions are not necessary to duplicate when creating
a map of transport actions. Relatedly, the ActionHandler class doesn't
actually need strongly typed action type and classes since they are lost
when shoved into the node client map. This commit removes these type
restrictions and generic parameters.
2025-04-07 16:22:56 -07:00
David Turner
cedcb5ccfe
Replace TransportResponse.Empty with ActionResponse.Empty (#126400)
No need to distinguish these things any more, we can just use
`ActionResponse.Empty` everywhere.
2025-04-08 06:58:06 +10:00
David Turner
f6c1965101
Forward port changes from backport of #125562 (#126413)
The backport to `8.x` needed some changes to pass through CI; this
commit forward-ports the relevant bits of those changes back into `main`
to keep the branches aligned.
2025-04-07 19:05:06 +01:00
David Turner
fbbbdd7eec
Allow overriding blob container path in tests (#126391)
Some `AbstractBlobContainerRetriesTestCase#createBlobContainer`
implementations choose a path for the container randomly, but we have a
need for a test which re-creates the same container against a different
`S3Service` and `BlobStore` and must therefore specify the same path
each time. This commit exposes a parameter that lets callers specify a
container path.
2025-04-08 03:54:37 +10:00
Oleksandr Kolomiiets
21ff72bef4
Use FallbackSyntheticSourceBlockLoader for text fields (#126237) 2025-04-07 09:32:35 -07:00
David Turner
896598570c
Reinstate S3SearchableSnapshotsCredentialsReloadIT in FIPS JVMs (#126109)
These tests only don't work in a FIPS JVM because they use a secret key
that is unacceptably short. This commit replaces the relevant uses of
`randomIdentifier` with `randomSecretKey` so they work whether in FIPS
mode or not.
2025-04-04 18:42:09 +11:00
Ben Chaplin
9f6eb1d4e3
Log stack traces on data nodes before they are cleared for transport (#125732)
We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false.
2025-04-03 13:45:09 -04:00
Mary Gouseti
488951edf3
Data stream lifecycle does not record error in failure store rollover (#126229)
**Issue** The data stream lifecycle does not register correctly rollover
errors for failure store.

**Observed bahaviour** When data stream lifecycle encounters a rollover
error it records it unless it sees that the current write index of this
data stream doesn't match the source index of the request. However, the
write index check does not use the failure write index but the write
backing index, so the failure gets ignored

**Desired behaviour** When data stream lifecycle encounters a rollover
error it will check the relevant write index before it determines if it
should be recorded or not.
2025-04-04 03:44:09 +11:00
Oleksandr Kolomiiets
f3ccde6959
Use FallbackSyntheticSourceBlockLoader for point and geo_point (#125816) 2025-04-01 12:55:18 -07:00
David Turner
0d64aab4cc
Clean up request parsing in S3HttpHandler (#126034)
The `METHOD /path/components?and=query` string representation of a
request is becoming increasingly difficult to parse, with slight
variations in parsing between the implementation in `S3HttpHandler` and
the various other implementations. This commit gets rid of the
string-concatenate-and-split behaviour in favour of a proper object that
has predicates for testing all the different kinds of request that might
be made against S3.
2025-04-02 05:49:50 +11:00
Jordan Powers
71e74bdd66
Store arrays offsets for scaled float fields natively with synthetic source (#125793)
This patch builds on the work in #113757, #122999, #124594, #125529, and 
#125709 to natively store array offsets for scaled float fields instead of
falling back to ignored source when synthetic_source_keep: arrays.
2025-03-28 20:26:29 +01:00
Mary Gouseti
1943844d5a
Effort to fix testDataStreamLifecycleDownsampleRollingRestart #123769 (#125478) 2025-03-28 15:26:09 +02:00
Yang Wang
3568ab8eac
Migrate RepositoriesMetadata to ProjectCustom (#125398)
This PR migrates RepositoriesMetadata from Metadata#ClusterCustom to
Metadata#ProjectCustom and handles wire BWC.

Resolves: ES-10477
2025-03-28 17:53:17 +11:00
Jordan Powers
689eaf20f4
Store arrays offsets for unsigned long fields natively with synthetic source (#125709)
This patch builds on the work in #113757, #122999, #124594, and #125529 to
natively store array offsets for unsigned long fields instead of falling
back to ignored source when synthetic_source_keep: arrays.
2025-03-27 00:59:24 +02:00
Mark Vieira
930b4ab995
Convert remaining plugin projects to new test clusters framework (#125626) 2025-03-26 13:44:07 -07:00
David Turner
40095992c2
Add more addTemporaryStateListener utils (#125648)
We often call `addTemporaryStateListener` with the `ClusterService` of a
random node, or the currently elected master. This commit adds utilities
for this common pattern.
2025-03-26 21:15:18 +11:00
Ievgen Degtiarenko
11fed4502c
Improve StatementParserTests error message (#125568) 2025-03-25 14:23:18 +01:00
Niels Bauman
542a3b65a9
Fix data stream retrieval in DataStreamLifecycleServiceIT (#125195)
These tests had the potential to fail when two consecutive GET data
streams requests would hit two different nodes, where one node already
had the cluster state that contained the new backing index and the other
node didn't yet.

Caused by #122852

Fixes #124846
Fixes #124950
Fixes #124999
2025-03-24 17:43:09 +02:00
Armin Braun
50437e79d3
Cleanup missing use of StandardCharsets (#125424)
Random annoyance that I figured, I'd just fix globally:
We can do a bit of a cleaner job when doing byte <-> string conversion here and there.
2025-03-21 20:10:15 +01:00
Nik Everett
e897a1422f
Aggs: Let terms run in global ords mode no match (#124782)
Allows the `terms` agg to run with global ords if the top level query
matches no docs *but* the agg is configured to collect buckets with 0
docs.
2025-03-21 13:00:25 -04:00
Armin Braun
9c8750bc8c
Stop retaining transport responses past serialization (#125163)
Remove the `OutboundMessage` class that needlessly holds on to the the response instances after they are not needed any longer. Inlining the logic should save considerably heap under pressure and enabled further optimisations.
2025-03-21 13:08:54 +01:00
Mary Gouseti
2c377f9c85
Unify template builders for data stream options, failure store and data stream lifecycle (#125293) 2025-03-21 10:03:27 +02:00
Yang Wang
7a0a399055
[Test] Reconcile TestProjectResolvers (#124988)
This PR updates the different methods in TestProjectResolvers so that
their names are more accurate and behaviours to be more as expected.

For example, In MP-1749, we differentiate between single-project and
single-project only resolvers. The later should not support multi-project.
2025-03-21 11:43:05 +11:00
David Turner
f04761c31a
Remove redundant response parameter to onResponseSent() (#125326)
Nobody uses this parameter (except some tests that simply verify the
otherwise-unused plumbing is connected). This commit removes it.

Relates #125163
2025-03-21 04:50:08 +11:00