This PR updates `bc-fips` and `bctls-fips` dependencies to the latest
minor versions.
(cherry picked from commit 6ea3e01958)
Co-authored-by: Slobodan Adamović <slobodanadamovic@users.noreply.github.com>
This updates the gradle wrapper to 8.12
We addressed deprecation warnings due to the update that includes:
- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
(cherry picked from commit ba61f8c7f7)
# Conflicts:
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/distribution/DockerCloudElasticsearchDistributionType.java
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/distribution/DockerUbiElasticsearchDistributionType.java
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/test/Fixture.java
# plugins/repository-hdfs/hadoop-client-api/build.gradle
# server/src/main/java/org/elasticsearch/inference/ChunkingOptions.java
# x-pack/plugin/kql/build.gradle
# x-pack/plugin/migrate/build.gradle
# x-pack/plugin/security/qa/security-basic/build.gradle
[esql] > Unexpected error from Elasticsearch: illegal_state_exception - sink exchanger for id [ruxoDDxXTGW55oIPHoCT-g:964613010] already exists.
This issue occurs when two or more clusterAliases point to the same
physical remote cluster. The exchange service assumes the destination is
unique, which is not true in this topology. This PR addresses the
problem by appending a suffix using a monotonic increasing number,
ensuring that different exchanges are created in such cases.
Another issue arising from this behavior is that data on a remote
cluster is processed multiple times, leading to incorrect results. I can
work on the fix for this once we agree that this is an issue.
We don't seem to have a test that completely verifies that a S3
repository can reload credentials from an updated keystore. This commit
adds such a test.
Backport of #116762 to 8.16.
The fetch phase is subject to timeouts like any other search phase. Timeouts
may happen when low level cancellation is enabled (true by default), hence the
directory reader is wrapped into ExitableDirectoryReader and a timeout is
provided to the search request.
The exception that is used is TimeExceededException, but it is an internal
exception that should never be returned to the user. When that is thrown, we
need to catch it and throw error or mark the response as timed out depending
on whether partial results are allowed or not.
A `CompleteMultipartUpload` action may fail after sending the `200 OK`
response line. In this case the response body describes the error, and
the SDK translates this situation to an exception with status code 0 but
with the `ErrorCode` string set appropriately. This commit enhances the
exception handling in `S3BlobContainer` to handle this possibility.
Closes#102294
Co-authored-by: Pat Patterson <metadaddy@gmail.com>
* Allow for queries on _tier to skip shards during coordinator rewrite (#114990)
The `_tier` metadata field was not used on the coordinator when
rewriting queries in order to exclude shards that don't match. This lead
to queries in the following form to continue to report failures even
though the only unavailable shards were in the tier that was excluded
from search (frozen tier in this example):
```
POST testing/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"_tier": "data_frozen"
}
}
]
}
}
}
```
This PR addresses this by having the queries that can execute on `_tier`
(term, match, query string, simple query string, prefix, wildcard)
execute a coordinator rewrite to exclude the indices that don't match
the `_tier` query **before** attempting to reach to the shards (shards,
that might not be available and raise errors).
Fixes#114910
* Don't use getFirst
* Test compile
Long GC disruption relies on Thread.resume, which is removed in JDK 23.
Tests that use it predate more modern disruption tests. This commit
removes gc disruption and the master disruption tests. Note that tests
relying on this scheme have already not been running since JDK 20 first
deprecated Thread.resume.
Here we check for the existence of a `host.name` field in index sort settings
when the index mode is `logsdb` and decide to inject the field in the mapping
depending on whether it exists or not. By default `host.name` is required for
sorting in LogsDB. This reduces the chances for errors at mapping or template
composition time as a result of injecting the `host.name` field only if strictly
required. A user who wants to override index sort settings without including
a `host.name` field would be able to do so without finding an additional
`host.name` field in the mappings (injected automatically). If users override the
sort settings and a `host.name` field is not included we don't need
to inject such field since sorting does not require it anymore.
As a result of this change we have the following:
* the user does not provide any index sorting configuration: we are responsible for injecting the default sort fields and their mapping (for `logsdb`)
* the user explicitly provides non-empty index sorting configuration: the user is also responsible for providing correct mappings and we do not modify index sorting or mappings
Note also that all sort settings `index.sort.*` are `final` which means doing this
check once, when mappings are merged at template composition time, is enough.
(cherry picked from commit 9bf6e3b0ba)
JNA has a static thread which handles cleaning up native memory
references. This commit adds the thread name to those filtered out of
thread leak detection since it lives for the lifetime of the JDK (yet
might be started in the middle of a test).
closes#114555
* Create a fluent builder to help implement ChunkedToXContent (#112389)
Rather than manually adding startObject/endObject, and having to line everything up manually, this handles the start/end for you.
A few implementations are converted already. In the long run, I would like this to replace ChunkedXContentHelper.
* Convert a few more implementations to ChunkedXContentBuilder (#113125)
Remove the complex methods from ChunkedXContentHelper
* Further conversions to ChunkedXContentBuilder (#114237)
---------
Co-authored-by: Simon Cooper <simon.cooper@elastic.co>
The ST_DISTANCE function added in #108764 was optimized for lucene pushdown in a series of followup PRs, but this did not include sorting by distance. Now this is resolved, for two key scenarios, both known to be valued by users:
* Sorting by distance:
`FROM index | EVAL distance=ST_DISTANCE(field, literal) | SORT distance`
* Sorting and filtering by distance:
`FROM index | EVAL distance=ST_DISTANCE(field, literal) | WHERE distance < literal | SORT distance`
The key changes required to make this work:
* Add to the EsQueryExec the appropriate sort->_geo_distance sort type
* Enhance PushTopNToSource to understand how to pushdown the sort even when there is an EVAL in between the FROM and the SORT (between the TopNExec and the EsQueryExec in the physical plan).
* Enhance PushFiltersToSource to understand how to pushdown the filter even when there is an EVAL in between the FROM and the WHERE (between the Filter and the EsQueryExec in the physical plan).
A useful bonus feature of this additional EVAL intelligence is that other, non-spatial cases are now also pushed down. In particular EVALs that are simple aliases are considered and pushed down, for both filtering and sorting.
Local benchmark results, very approximate, but show massive improvements for distanceSort and distanceFilterSort, which relate to the two cases listed above.
Benchmark Query DSL ESQL before this PR ESQL after this PR Comments
distanceFilter 10 5 5 Optimized in #109972
distanceEvalFilter 10 10000 1500 Still slow due to unnecessary EVAL
distanceSort 150 12000 160
distanceFilterSort 20 10000 24
NOTE: This enables pushing down sorting by any ReferenceAttribute that either refers to a sortable FieldAttribute, or to an StDistance function that itself refers to a suitable FieldAttribute of geo_point type.
---------
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
`ThreadContext#stashContext` doesn't guarantee to give a clean thread
context, but it's important we don't allow the callers' thread contexts
to leak into the cluster state update. This commit captures the desired
thread context at startup rather than using `stashContext` when forking
the processor.
Delay construction of `Warnings` until they are needed to save memory
when evaluating many many many expressions. Most expressions won't use
warnings at all and there isn't any need to make registering warnings
super duper fast. So let's make the construction lazy to save a little
memory. It's like 200 bytes per expression which isn't much, but it's
possible to have thousands of expressions in a single query. Abusive,
but possible.
This also consolidates all `Warnings` usages to a single `Warnings`
class. We had two. We don't need two.
Currently we have a relatively basic decider about when to throttling
indexing. This commit adds two levels of watermarks with configurable
bulk size deciders. Additionally, adds additional settings to control
primary, coordinating, and replica rejection limits.
Part of https://github.com/elastic/elasticsearch/issues/99815
## Steps 1. Migrate TDigest classes to use a custom Array
implementation. Temporarily use a simple array wrapper
(https://github.com/elastic/elasticsearch/pull/112810) 2. Implement
CircuitBreaking in the `WrapperTDigestArrays` class. Add
Releasable/AutoCloseable and ensure everything is closed
(https://github.com/elastic/elasticsearch/pull/113105) 3. Pass the
CircuitBreaker as a parameter to TDigestState from wherever it's being
used (This PR) - ESQL: Pass a real CB - Other aggs: Use the
deprecated methods on `TDigestState`, that will use a No-op CB instead
4. Account remaining TDigest classes size ("SHALLOW_SIZE")
Every step should be safely mergeable to main: - The first and second
steps should have no impact. - The third and fourth ones will start
increasing the CB count partially.
## Remarks TDigestStates are Releasable, and should be closed now.
However, old aggregations don't close them, as it's not trivial, and as
they are using the NoopCircuitBreaker, there's no need to close them
* Add assertWarnings capabilities to base token stream test case (#113619)
We need to be able to assert various warnings and check for such in
typical token stream tests. This adds that capability.
* fixing test
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
It's possible that the expected thread isn't the only thread that made
no progress since the last check, so this commit generalizes the
assertion to allow for other threads to be mentioned here too.
Closes#113734
Enhance ES|QL responses to include information about `took` time (search latency), shards, and
clusters against which the query was executed.
The goal of this PR is to begin to provide parity between the metadata displayed for
cross-cluster searches in _search and ES|QL.
This PR adds the following features:
- add overall `took` time to all ES|QL query responses. And to emphasize: "all" here
means: async search, sync search, local-only and cross-cluster searches, so it goes
beyond just CCS.
- add `_clusters` metadata to the final response for cross-cluster searches, for both
async and sync search (see example below)
- tracking/reporting counts of skipped shards from the can_match (SearchShards API)
phase of ES|QL processing
- marking clusters as skipped if they cannot be connected to (during the field-caps
phase of processing)
Out of scope for this PR:
- honoring the `skip_unavailable` cluster setting
- showing `_clusters` metadata in the async response **while** the search is still running
- showing any shard failure messages (since any shard search failures in ES|QL are
automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that
this also means that the `failed` shard count is always 0 in ES|QL `_clusters` section.
Things changed with respect to behavior in `_search`:
- the `timed_out` field in `_clusters/details/mycluster` was removed in the ESQL
response, since ESQL does not support timeouts. It could be added back later
if/when ESQL supports timeouts.
- the `failures` array in `_clusters/details/mycluster/_shards` was removed in the ESQL
response, since any shard failure causes the whole query to fail.
Example output from ES|QL CCS:
```es
POST /_query
{
"query": "from blogs,remote2:bl*,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5"
}
```
```json
{
"took": 49,
"columns": [
{
"name": "authors.first_name",
"type": "text"
},
{
"name": "publish_date",
"type": "date"
}
],
"values": [
[
"Tammy",
"2009-11-04T04:08:07.000Z"
],
[
"Theresa",
"2019-05-10T21:22:32.000Z"
],
[
"Jason",
"2021-11-23T00:57:30.000Z"
],
[
"Craig",
"2019-12-14T21:24:29.000Z"
],
[
"Alexandra",
"2013-02-15T18:13:24.000Z"
]
],
"_clusters": {
"total": 3,
"successful": 2,
"running": 0,
"skipped": 1,
"partial": 0,
"failed": 0,
"details": {
"(local)": {
"status": "successful",
"indices": "blogs",
"took": 43,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
}
},
"remote2": {
"status": "skipped", // remote2 was offline when this query was run
"indices": "remote2:bl*",
"took": 0,
"_shards": {
"total": 0,
"successful": 0,
"skipped": 0,
"failed": 0
}
},
"remote1": {
"status": "successful",
"indices": "remote1:blogs",
"took": 47,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
}
}
}
}
}
```
Fixes https://github.com/elastic/elasticsearch/issues/112402 and https://github.com/elastic/elasticsearch/issues/110935