CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.
closes#126883
Today we rely on registering the channel after registering the task to
be cancelled to ensure that the task is cancelled even if the channel is
closed concurrently. However the client may already have processed a
cancellable request on the channel and therefore this mechanism doesn't
work. With this change we make sure not to register another task after
draining the registrations in order to cancel them.
Closes#88201
This reverts commit 4a77e06. We've seen a significant performance degradation in merging vectors resulting from the use of MADV_RANDOM and MGLRU ( and LRU in recent Linux kernels )
For the 8.x release train, then we will revert the change that enabled MADV_RANDOM. And backport to all shipping 8.x bugfix releases.
relates: #124499
With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.
The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.
While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.
This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
We have some tolerance wound how many bytes we report for these completion fields. But the
values depend on the distribution of the random values that determine how many docs get
an option field. This commit makes the test more precise by computing the real ratio
between docs that have the optional field and the total number of docs, so that we
can base assertion on more realistic expectations.
Closes#123269
When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug.
This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case.
If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`.
Fixes#124667
Relates to #18613
In #112380 we changed this `assert` to yield a `String` on failure
rather than the original `ElasticsearchException`, which means we don't
see the original completion's stack trace any more. This commit
reinstates the lost stack trace.
* Fix concurrency issue in ScriptSortBuilder (#123757)
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.
The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.
The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
* iter
* Revert fail-fast disconnect strategy for `_resolve/cluster`
This was the first solution for the long wait times that user could
potentially see when invoking this API. However, we reverted this change
in favour of the new timeout parameter. Unfortunately, the change in
disconnect strategy targeted more broader versions than the timeout
parameter PR (which contained the revert). This PR fixes this discrepancy.
* Update docs/changelog/124241.yaml
* Fix Gradle Deprecation warning as declaring an is- property with a Boolean type has been deprecated.
* Make use of new layout.settingsFolder api to address some cross project references
* Fix buildParams snapshot check for multiprojet projects
(cherry picked from commit e19b2264af)
# Conflicts:
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/BaseInternalPluginBuildPlugin.java
# docs/build.gradle
# qa/entitlements/build.gradle
# x-pack/qa/multi-project/core-rest-tests-with-multiple-projects/build.gradle
# x-pack/qa/multi-project/xpack-rest-tests-with-multiple-projects/build.gradle
We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.
The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
Rather than checking the license (updating the usage map) on every
single shard, just do it once at the start of a computation that needs
to forecast write loads.
Backport of #123346 to 8.x
Closes#123247
These things can be quite expensive and there's no need to recompute
them in parallel across all management threads as done today. This
commit adds a deduplicator to avoid redundant work.
Backport of #123246 to `8.x`
The block-on-data-node returns once the data node begins to process the
cluster state update for new snapshot. This is before master can see the
chnages. In edge cases, the listener may be completed too early before
the master can see the new snapshot. This PR flushes the master queue to
ensure the snapshot is visible.
Resolves: #116730
(cherry picked from commit 5d9385f1ca)
# Conflicts:
# muted-tests.yml
* [8.x] Logsdb and source only snapshots.
Backporting #122199 to 8.x branch.
Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.
Also added a few more tests that snapshot and restore logsdb data streams.
* fix test
When utilizing synthetic source with nested fields, we attempt to
rebuild the child values in addition to all the parent values.
While this generally works well, its potential that certain values might
be missing from various child docs. Consequently, we will attempt to
iterate the vector values strangely, resulting in seemingly missing
values or potentially exceptions indicating EOFs.
closes: #122383
(cherry picked from commit f5c901e68c)
Two of the timeout tests have been muted for several months. The reason is that we tightened the assertions to cover for partial results being returned, but there were edge cases in which partial results were not actually returned.
The timeout used in the test was time dependent, hence when the timeout precisely will be thrown is unpredictable, because we have timeout checks in different places in the codebase, when iterating through the leaves, before scoring any document, or while scoring documents. The edge case that caused failures is a typical timing issue where the initial check for timeout in CancellableBulkScorer already triggers the timeout, before any document has been collected.
I made several adjustments to the test to make it more robust:
- use index random to index documents, that speeds it up
- share indexing across test methods, so that it happens once at the suite level
- replace the custom query that triggers a timeout to not be a script query, but rather a lucene query that is not time dependent, and throws a time exceeded exception precisely where we expect it, so that we can test how the system reacts to that. That allows to test that partial results are always returned when a timeout happens while scoring documents, and that partial results are never returned when a timeout happens before we even started to score documents.
Closes#98369Closes#98053
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.
Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.
---------
Conflicts:
test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
If the `MasterService` needs to log a create-snapshot task description
then it will call `CreateSnapshotTask#toString`, which today calls
`RepositoryData#toString` which is not overridden so ends up calling
`RepositoryData#hashCode`. This can be extraordinarily expensive in a
large repository. Worse, if there's masses of create-snapshot tasks to
execute then it'll do this repeatedly, because each one only ends up
yielding a short hex string so we don't reach the description length
limit very easily.
With this commit we provide a more efficient implementation of
`CreateSnapshotTask#toString` and also override
`RepositoryData#toString` to protect against some other caller running
into the same issue.
This PR resolves a regression introduced in #94564 by ensuring that the approximation is used when advancing matched query clauses.
Utilizing the two-phase iterator to validate matches guarantees that we do not attempt to find the next document fulfilling the two-phase criteria beyond the current document.
This fix prevents scenarios where matching a document in the second phase significantly increases query complexity, especially in cases involving restrictive second-pass filters.
Closes#120130
We are creating tmp files that might not get closed if an exception happens just after it. This commit makes sure all
errors are handle properly and files are getting closed and deleted.
# Conflicts:
# muted-tests.yml