This updates the gradle wrapper to 8.12
We addressed deprecation warnings due to the update that includes:
- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
(cherry picked from commit ba61f8c7f7)
# Conflicts:
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/distribution/DockerCloudElasticsearchDistributionType.java
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/distribution/DockerUbiElasticsearchDistributionType.java
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/test/Fixture.java
# plugins/repository-hdfs/hadoop-client-api/build.gradle
# server/src/main/java/org/elasticsearch/inference/ChunkingOptions.java
# x-pack/plugin/kql/build.gradle
# x-pack/plugin/migrate/build.gradle
# x-pack/plugin/security/qa/security-basic/build.gradle
The ST_DISTANCE function added in #108764 was optimized for lucene pushdown in a series of followup PRs, but this did not include sorting by distance. Now this is resolved, for two key scenarios, both known to be valued by users:
* Sorting by distance:
`FROM index | EVAL distance=ST_DISTANCE(field, literal) | SORT distance`
* Sorting and filtering by distance:
`FROM index | EVAL distance=ST_DISTANCE(field, literal) | WHERE distance < literal | SORT distance`
The key changes required to make this work:
* Add to the EsQueryExec the appropriate sort->_geo_distance sort type
* Enhance PushTopNToSource to understand how to pushdown the sort even when there is an EVAL in between the FROM and the SORT (between the TopNExec and the EsQueryExec in the physical plan).
* Enhance PushFiltersToSource to understand how to pushdown the filter even when there is an EVAL in between the FROM and the WHERE (between the Filter and the EsQueryExec in the physical plan).
A useful bonus feature of this additional EVAL intelligence is that other, non-spatial cases are now also pushed down. In particular EVALs that are simple aliases are considered and pushed down, for both filtering and sorting.
Local benchmark results, very approximate, but show massive improvements for distanceSort and distanceFilterSort, which relate to the two cases listed above.
Benchmark Query DSL ESQL before this PR ESQL after this PR Comments
distanceFilter 10 5 5 Optimized in #109972
distanceEvalFilter 10 10000 1500 Still slow due to unnecessary EVAL
distanceSort 150 12000 160
distanceFilterSort 20 10000 24
NOTE: This enables pushing down sorting by any ReferenceAttribute that either refers to a sortable FieldAttribute, or to an StDistance function that itself refers to a suitable FieldAttribute of geo_point type.
---------
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
Delay construction of `Warnings` until they are needed to save memory
when evaluating many many many expressions. Most expressions won't use
warnings at all and there isn't any need to make registering warnings
super duper fast. So let's make the construction lazy to save a little
memory. It's like 200 bytes per expression which isn't much, but it's
possible to have thousands of expressions in a single query. Abusive,
but possible.
This also consolidates all `Warnings` usages to a single `Warnings`
class. We had two. We don't need two.
Enhance ES|QL responses to include information about `took` time (search latency), shards, and
clusters against which the query was executed.
The goal of this PR is to begin to provide parity between the metadata displayed for
cross-cluster searches in _search and ES|QL.
This PR adds the following features:
- add overall `took` time to all ES|QL query responses. And to emphasize: "all" here
means: async search, sync search, local-only and cross-cluster searches, so it goes
beyond just CCS.
- add `_clusters` metadata to the final response for cross-cluster searches, for both
async and sync search (see example below)
- tracking/reporting counts of skipped shards from the can_match (SearchShards API)
phase of ES|QL processing
- marking clusters as skipped if they cannot be connected to (during the field-caps
phase of processing)
Out of scope for this PR:
- honoring the `skip_unavailable` cluster setting
- showing `_clusters` metadata in the async response **while** the search is still running
- showing any shard failure messages (since any shard search failures in ES|QL are
automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that
this also means that the `failed` shard count is always 0 in ES|QL `_clusters` section.
Things changed with respect to behavior in `_search`:
- the `timed_out` field in `_clusters/details/mycluster` was removed in the ESQL
response, since ESQL does not support timeouts. It could be added back later
if/when ESQL supports timeouts.
- the `failures` array in `_clusters/details/mycluster/_shards` was removed in the ESQL
response, since any shard failure causes the whole query to fail.
Example output from ES|QL CCS:
```es
POST /_query
{
"query": "from blogs,remote2:bl*,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5"
}
```
```json
{
"took": 49,
"columns": [
{
"name": "authors.first_name",
"type": "text"
},
{
"name": "publish_date",
"type": "date"
}
],
"values": [
[
"Tammy",
"2009-11-04T04:08:07.000Z"
],
[
"Theresa",
"2019-05-10T21:22:32.000Z"
],
[
"Jason",
"2021-11-23T00:57:30.000Z"
],
[
"Craig",
"2019-12-14T21:24:29.000Z"
],
[
"Alexandra",
"2013-02-15T18:13:24.000Z"
]
],
"_clusters": {
"total": 3,
"successful": 2,
"running": 0,
"skipped": 1,
"partial": 0,
"failed": 0,
"details": {
"(local)": {
"status": "successful",
"indices": "blogs",
"took": 43,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
}
},
"remote2": {
"status": "skipped", // remote2 was offline when this query was run
"indices": "remote2:bl*",
"took": 0,
"_shards": {
"total": 0,
"successful": 0,
"skipped": 0,
"failed": 0
}
},
"remote1": {
"status": "successful",
"indices": "remote1:blogs",
"took": 47,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
}
}
}
}
}
```
Fixes https://github.com/elastic/elasticsearch/issues/112402 and https://github.com/elastic/elasticsearch/issues/110935
This was used for some performance investigations but is not currently
needed, and would need updating in order to complete #100878. Instead,
this commit removes it.
* Enable corresponding validation in EsqlQueryRequest.
* Add the ESQL version to requests to /_query in integration tests.
* In mixed cluster tests for versions prior to 8.13.3, impersonate an 8.13
client and do not send any version.
---------
Co-authored-by: Nik Everett <nik9000@gmail.com>
To simplify the migration away from version based skip checks in YAML specs,
this PR adds a synthetic version feature `gte_vX.Y.Z` for any version at or before 8.14.0.
New test specs for 8.14 or later are expected to use respective new cluster features,
or a test-only feature supplied via ESRestTestCase#createAdditionalFeatureSpecifications
if sufficient.
I investigated a heap attack test failure and found that an ESQL request
was stuck. This occurred in the following:
1. The ExchangeSource on the coordinator was blocked on reading because
there were no available pages.
2. Meanwhile, the ExchangeSink on the data node had pages ready for
fetching.
3. When an exchange request tried to fetch pages, it failed due to a
CircuitBreakingException. Despite the failure, no cancellation was
triggered because the status of the ExchangeSource on the coordinator
remained unchanged. To fix this issue, this PR introduces two changes:
Resumes the ExchangeSourceOperator and Driver on the coordinator,
eventually allowing the coordinator to trigger cancellation of the
request when failing to fetch pages.
Ensures that an exchange sink on the data nodes fails when a data node
request is cancelled. This callback was inadvertently omitted when
introducing the node-level reduction in Run empty reduction node level
on data nodes #106204.
I plan to spend some time to harden the exchange and compute service.
Closes#106262
We should run heap-attack tests on multiple nodes to ensure that we
avoid causing OOM during the serialization/deserialization of exchange
responses. I've merged the required changes and run thousands of
iterations of these tests without seeing any failures.
We've seen cases of OOM errors in the test runner process, which occur
when we convert a response to a JSON string and then parse it. We can
directly parse from its input stream to avoid these OOM errors.
A predicate to check whether the cluster supports a feature is available
to rest handlers defined in server. This commit adds that predicate to
plugins defining rest handlers as well.
`ActionType` represents an action which runs on the local node, there's
no need for implementations to define a `Reader<Response>`. This commit
removes the unused constructor argument.
I noticed we're using 5 minutes for both query timeout and triggering
the out-of-memory action in heap attack tests. This means when we're
generating the heap dump, and some ESQL tasks might get canceled because
the connection was disconnected. This PR increases the query timeout to
6 minutes instead.