* [8.x] Logsdb and source only snapshots.
Backporting #122199 to 8.x branch.
Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.
Also added a few more tests that snapshot and restore logsdb data streams.
* fix test
This simplifies the setup and relaxes the similarity check.
We can restrict the similarity check once we evolve the quantization
algorithm in the future.
(cherry picked from commit 2de1a3defe)
When utilizing synthetic source with nested fields, we attempt to
rebuild the child values in addition to all the parent values.
While this generally works well, its potential that certain values might
be missing from various child docs. Consequently, we will attempt to
iterate the vector values strangely, resulting in seemingly missing
values or potentially exceptions indicating EOFs.
closes: #122383
(cherry picked from commit f5c901e68c)
The nebula info broker plugin takes the information for the manifest from the java project settings rather then from
the compile task configuration. Instead of setting the compiler task configuration explicitly we now set the project
configuration accordingly. Also tweaked the javaTestCompile tasks to keep compiling with general minimum runtime version as we did before
(cherry picked from commit 6e6e42f5d4)
Two of the timeout tests have been muted for several months. The reason is that we tightened the assertions to cover for partial results being returned, but there were edge cases in which partial results were not actually returned.
The timeout used in the test was time dependent, hence when the timeout precisely will be thrown is unpredictable, because we have timeout checks in different places in the codebase, when iterating through the leaves, before scoring any document, or while scoring documents. The edge case that caused failures is a typical timing issue where the initial check for timeout in CancellableBulkScorer already triggers the timeout, before any document has been collected.
I made several adjustments to the test to make it more robust:
- use index random to index documents, that speeds it up
- share indexing across test methods, so that it happens once at the suite level
- replace the custom query that triggers a timeout to not be a script query, but rather a lucene query that is not time dependent, and throws a time exceeded exception precisely where we expect it, so that we can test how the system reacts to that. That allows to test that partial results are always returned when a timeout happens while scoring documents, and that partial results are never returned when a timeout happens before we even started to score documents.
Closes#98369Closes#98053
The aggs timeout test waits for the agg to return and then double checks
that the agg is stopped using the tasks API. We're seeing some failures
where the tasks API reports that the agg is still running. I can't
reproduce them because computers. This adds two things:
1. Logs the hot_threads so we can see if the query is indeed still
running.
2. Retries the _tasks API for a minute. If it goes away soon after the
_search returns that's *fine*. If it sticks around for more than a
few seconds then the cancel isn't working. We wait for a minute
because CI can't be trusted to do anything quickly.
Closes#121993
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.
Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.
---------
Conflicts:
test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
If the query hits the failing index first, we will cancel the request,
preventing exchange-sink requests and data-node requests from reaching
another data node. As a result, exchange sinks could stay for 30
seconds.
We need to explicitly add the incubating vector API module to the third
party audit task on Java 24.
(cherry picked from commit 0c667ecd2a)
# Conflicts:
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/precommit/ThirdPartyAuditTask.java
Under very unfortunate conditions tests that check xContent objects
roundtrip parsing (like i.e. SearchHitsTests#testFromXContent)
can fail when we happen to randomly pick YAML xContent type and create
random (realistic)Unicode character sequences that may contain the
character U+0085 (133) from the Latin1 code page. That specific character
doesn't get parsed back to its original form for YAML xContent, which can
lead to rare but hard to diagnose test failures.
This change adds logic to AbstractXContentTestCase#test() which lies at
the core of most of our xContent roundtrip tests that disallows test
instances containing that particular character when using YAML xContent
type.
Closes#97716
The upper bound of randomVersionBetween is inclusive; therefore, for
testing the fallback version of the request, we need to use the version
preceding 8.16.0 rather than 8.16.0 itself.
Closes#117937
Add missing apm-server tail sampling monitoring metrics to stack monitoring mapping. They were missed in #110568.
(cherry picked from commit f3f5135f06)
# Conflicts:
# x-pack/plugin/monitoring/src/main/java/org/elasticsearch/xpack/monitoring/MonitoringTemplateRegistry.java
Fixes a bug where the deployment Id was lost creating the text embedding
model configuration
# Conflicts:
# x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalService.java
# x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elasticsearch/ElserInternalServiceSettings.java
# x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elasticsearch/MultilingualE5SmallInternalServiceSettings.java
# x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalServiceTests.java
* Fix inference update API calls with task_type in body or deployment_id defined
* Update docs/changelog/121231.yaml
* Fixing test
* Reuse existing deployment ID retrieval logic
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
If the `MasterService` needs to log a create-snapshot task description
then it will call `CreateSnapshotTask#toString`, which today calls
`RepositoryData#toString` which is not overridden so ends up calling
`RepositoryData#hashCode`. This can be extraordinarily expensive in a
large repository. Worse, if there's masses of create-snapshot tasks to
execute then it'll do this repeatedly, because each one only ends up
yielding a short hex string so we don't reach the description length
limit very easily.
With this commit we provide a more efficient implementation of
`CreateSnapshotTask#toString` and also override
`RepositoryData#toString` to protect against some other caller running
into the same issue.
* [Gradle] Make rolling upgrade tests configuration cache compatible (#119577)
With this, all rolling upgrade tests that involve a
`nextNodeToNextVersion` update are gradle configuration cache
compatible.
Simplify API around test cluster registry and cc compatible usage of
test cluster in TestClusterAware tasks.
(cherry picked from commit 7b6bdfa323)
# Conflicts:
# qa/ccs-rolling-upgrade-remote-cluster/build.gradle
# x-pack/plugin/sql/qa/jdbc/security/build.gradle
# x-pack/plugin/sql/qa/server/security/build.gradle
* Fix backport merge issue