Commit graph

14959 commits

Author SHA1 Message Date
Ryan Ernst
49eba87a0e
Fix uniquify to handle multiple successive duplicates (#126889) (#126954)
CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes #126883
2025-04-17 06:16:38 +10:00
elasticsearchmachine
533ea966b5 Bump versions after 8.17.5 release 2025-04-15 14:47:42 +00:00
David Turner
e930e9fd4e
Fix race condition in RestCancellableNodeClient (#126686) (#126700)
Today we rely on registering the channel after registering the task to
be cancelled to ensure that the task is cancelled even if the channel is
closed concurrently. However the client may already have processed a
cancellable request on the channel and therefore this mechanism doesn't
work. With this change we make sure not to register another task after
draining the registrations in order to cancel them.

Closes #88201
2025-04-12 02:10:17 +10:00
Chris Hegarty
0c068d113d Revert "Enable madvise by default for all builds (#110159)" (#126308)
This reverts commit 4a77e06. We've seen a significant performance degradation in merging vectors resulting from the use of MADV_RANDOM and MGLRU ( and LRU in recent Linux kernels )

For the 8.x release train, then we will revert the change that enabled MADV_RANDOM. And backport to all shipping 8.x bugfix releases.

relates: #124499
2025-04-04 17:17:53 +01:00
Luca Cavanna
00fc9cd973
Re-enable parallel collection for field sorted top hits (#125916) (#126014)
With #123610 we disabled parallel collection for field and script sorted top hits,
aligning its behaviour with that of top level search. This was mainly to work around
a bug in script sorting that did not support inter-segment concurrency.

The bug with script sort has been fixed with #123757 and concurrency re-enabled for it.

While sort by field is not optimized for search concurrency, top hits benefits from it
and disabling concurrency for sort by field in top hits has caused performance
regressions in our nightly benchmarks.

This commit re-enables concurrency for top hits with sort by field is used. This
introduces back a discrepancy between top level search and top hits, in that concurrency
is applied for top hits despite sort by field normally disables it. The key difference
is the context where sorting is applied, and the fact that concurrency is disabled
only for performance reasons on top level searches and not for functional reasons.
2025-04-01 19:48:18 +11:00
Luca Cavanna
2a1adad356
Address precision issue in IndexDiskUsageAnalyzerTests#testCompletionFields (#125849) (#125951) (#125963)
We have some tolerance wound how many bytes we report for these completion fields. But the
values depend on the distribution of the random values that determine how many docs get
an option field. This commit makes the test more precise by computing the real ratio
between docs that have the optional field and the total number of docs, so that we
can base assertion on more realistic expectations.

Closes #123269
2025-04-01 03:20:58 +11:00
elasticsearchmachine
635f64d21f Bump versions after 8.17.4 release 2025-03-25 15:23:21 +00:00
elasticsearchmachine
d8f3571eea Bump versions after 8.16.6 release 2025-03-25 12:23:43 +00:00
Ryan Ernst
d94f06670a
Cleanup command line setting errors (#124963) (#125133)
This commit improves the error cases when command line settings are
found that are duplicates or conflict with special system properties.
2025-03-19 04:57:35 +11:00
Moritz Mack
76218fff09
[8.17] Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (#124732) (#125068)
When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug.

This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case.

If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`.

Fixes #124667
Relates to #18613
2025-03-18 11:55:20 +01:00
Ignacio Vera
0dec400ebc
[8.17] Don't generate stacktrace in TaskCancelledException (#125002) (#125031)
* Don't generate stacktrace in TaskCancelledException (#125002)

# Conflicts:
#	modules/aggregations/src/internalClusterTest/java/org/elasticsearch/aggregations/bucket/SearchCancellationIT.java

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-03-18 03:29:12 +11:00
David Turner
c5b3dbccc3
Fix stack trace in ActionListener#assertOnce (#124672) (#124713)
In #112380 we changed this `assert` to yield a `String` on failure
rather than the original `ElasticsearchException`, which means we don't
see the original completion's stack trace any more. This commit
reinstates the lost stack trace.
2025-03-13 22:03:53 +11:00
Luca Cavanna
07f0a8b083
[8.17] Fix concurrency issue in ScriptSortBuilder (#123757) (#124517)
* Fix concurrency issue in ScriptSortBuilder (#123757)

Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.

* iter
2025-03-11 22:33:45 +11:00
Pawan Kartik
0fb9e26220
Revert fail-fast disconnect strategy for _resolve/cluster (#124241)
* Revert fail-fast disconnect strategy for `_resolve/cluster`

This was the first solution for the long wait times that user could
potentially see when invoking this API. However, we reverted this change
in favour of the new timeout parameter. Unfortunately, the change in
disconnect strategy targeted more broader versions than the timeout
parameter PR (which contained the revert). This PR fixes this discrepancy.

* Update docs/changelog/124241.yaml
2025-03-07 18:21:00 +00:00
Niels Bauman
7ac2963622
[8.17] Avoid hoarding cluster state references during rollover (#124107) (#124267)
# Backport

This will backport the following commits from `main` to `8.17`:  -
[Avoid hoarding cluster state references during rollover
(#124107)](https://github.com/elastic/elasticsearch/pull/124107)

<!--- Backport version: 9.6.4 -->

### Questions ? Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)
2025-03-07 07:11:16 +11:00
Rene Groeschke
f4e505e635
Update Gradle wrapper to 8.13 (#122421) (#123875)
* Fix Gradle Deprecation warning as declaring an is- property with a Boolean type has been deprecated.
* Make use of new layout.settingsFolder api to address some cross project references
* Fix buildParams snapshot check for multiprojet projects

(cherry picked from commit e19b2264af)

# Conflicts:
#	build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/BaseInternalPluginBuildPlugin.java
#	docs/build.gradle
#	qa/entitlements/build.gradle
#	x-pack/qa/multi-project/core-rest-tests-with-multiple-projects/build.gradle
#	x-pack/qa/multi-project/xpack-rest-tests-with-multiple-projects/build.gradle
2025-03-05 15:57:55 +01:00
elasticsearchmachine
0a854f8e31 Bump versions after 8.16.5 release 2025-03-04 14:28:01 +00:00
elasticsearchmachine
e48c08795b Bump versions after 8.17.3 release 2025-03-04 12:25:16 +00:00
Luca Cavanna
74ec162c72
Disable concurrency when top_hits sorts on anything but _score (#123610) (#123643)
We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
2025-02-28 08:36:07 +11:00
Keith Massey
60ee593dcd
Fixing serialization of ScriptStats cache_evictions_history (#123384) (#123614)
(cherry picked from commit 88cf2487e7)

# Conflicts:
#	server/src/main/java/org/elasticsearch/script/ScriptStats.java
2025-02-28 03:35:05 +11:00
elasticsearchmachine
6fdbe9ba4b Bump versions after 7.17.28 release 2025-02-27 15:43:45 +00:00
Nikolaj Volgushev
e882fa7340
Upgrade Netty to 4.1.118.Final (#122371) (#123482)
This PR upgrades Netty to `4.1.118.Final`.
2025-02-26 22:10:37 +11:00
Joe Gallo
4d8dcba267
Use ordered maps for PipelineConfiguration xcontent deserialization (#123403) (#123415) 2025-02-26 08:29:49 +11:00
David Turner
83e801540c
Reduce licence checks in LicensedWriteLoadForecaster (#123369) (#123409)
Rather than checking the license (updating the usage map) on every
single shard, just do it once at the start of a computation that needs
to forecast write loads.

Backport of #123346 to 8.x
Closes #123247
2025-02-26 07:06:33 +11:00
David Turner
a8470b035d
Deduplicate allocation stats calls (#123267) (#123281)
These things can be quite expensive and there's no need to recompute
them in parallel across all management threads as done today. This
commit adds a deduplicator to avoid redundant work.

Backport of #123246 to `8.x`
2025-02-25 03:48:57 +11:00
Oleksandr Kolomiiets
e9e1a82fb1
fix stale data in synthetic source for string stored field (#123105) (#123278)
Co-authored-by: jeffganmr <106223805+jeffganmr@users.noreply.github.com>
2025-02-25 03:41:03 +11:00
Dimitris Rempapis
b7da5d9e5a
backport code to branch (#123138)
Backports the following commits to 8.18:  - Update testing code to fix
randomization and unmute test (#123125) that.
2025-02-22 19:27:52 +11:00
Yang Wang
cd625ca09a
[Test] Flush master queue before checking snapshots (#116938) (#122720)
The block-on-data-node returns once the data node begins to process the
cluster state update for new snapshot. This is before master can see the
chnages. In edge cases, the listener may be completed too early before
the master can see the new snapshot. This PR flushes the master queue to
ensure the snapshot is visible.

Resolves: #116730
(cherry picked from commit 5d9385f1ca)

# Conflicts:
#	muted-tests.yml
2025-02-17 16:26:41 +11:00
Joe Gallo
8624765b51
Canonicalize processor names and types in IngestStats (#122610) (#122634) 2025-02-15 05:52:03 +11:00
elasticsearchmachine
e4e4cffccf Bump versions after 8.16.4 release 2025-02-14 16:17:39 +00:00
elasticsearchmachine
18e2b7efec Bump versions after 8.17.2 release 2025-02-14 15:52:39 +00:00
Martijn van Groningen
34a39bafc9
[8.x] Logsdb and source only snapshots. (#122572) (#122595)
* [8.x] Logsdb and source only snapshots.

Backporting #122199 to 8.x branch.

Addresses a few issues with logsdb and source only snapshots:
* Avoid initializing index sorting, because sort fields will not have doc values.
* Also disable doc value skippers when doc values get disabled.
* As part of source only validation figure out what the nested parent field is.

Also added a few more tests that snapshot and restore logsdb data streams.

* fix test
2025-02-14 23:45:19 +11:00
Ignacio Vera
4bdc3f9db0
Deduplicate IngestStats and IngestStats.Stats identity records when deserializing (#122496) (#122514)
This commit makes sure we reuse the existing static instance when deserializing to avoid excessive heap usage.
2025-02-14 04:22:27 +11:00
Benjamin Trent
1a426889ab
Fix synthetic source bug that would mishandle nested dense_vector fields (#122425) (#122437)
When utilizing synthetic source with nested fields, we attempt to
rebuild the child values in addition to all the parent values.

While this generally works well, its potential that certain values might
be missing from various child docs. Consequently, we will attempt to
iterate the vector values strangely, resulting in seemingly missing
values or potentially exceptions indicating EOFs.

closes: #122383
(cherry picked from commit f5c901e68c)
2025-02-13 09:44:18 +11:00
elasticsearchmachine
f108363930 Bump versions after 8.16.4 release 2025-02-11 20:18:32 +00:00
elasticsearchmachine
53fdc71026 Bump versions after 8.17.2 release 2025-02-11 19:50:48 +00:00
Luca Cavanna
1614d02e2d
Fix SearchTimeoutIT (#120390) (#122205)
Two of the timeout tests have been muted for several months. The reason is that we tightened the assertions to cover for partial results being returned, but there were edge cases in which partial results were not actually returned.

The timeout used in the test was time dependent, hence when the timeout precisely will be thrown is unpredictable, because we have timeout checks in different places in the codebase, when iterating through the leaves, before scoring any document, or while scoring documents. The edge case that caused failures is a typical timing issue where the initial check for timeout in CancellableBulkScorer already triggers the timeout, before any document has been collected.

I made several adjustments to the test to make it more robust:
- use index random to index documents, that speeds it up
- share indexing across test methods, so that it happens once at the suite level
- replace the custom query that triggers a timeout to not be a script query, but rather a lucene query that is not time dependent, and throws a time exceeded exception precisely where we expect it, so that we can test how the system reacts to that. That allows to test that partial results are always returned when a timeout happens while scoring documents, and that partial results are never returned when a timeout happens before we even started to score documents.

Closes #98369
Closes #98053
2025-02-11 06:52:43 +11:00
Mark Tozzi
b35a9239e6
Aggregations cancellation after collection (#120944) (#121952)
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.

Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.

---------



 Conflicts:
	test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-02-07 08:53:02 +11:00
Mark Vieira
795d92bbdd
Upgrade mockito (#121849) (#121933) 2025-02-07 05:53:09 +11:00
Oleksandr Kolomiiets
747663ddda
Fix synthetic source issue with deeply nested ignored source fields (#121715) (#121789)
* Fix synthetic source issue with deeply nested ignored source fields

* Update docs/changelog/121715.yaml
2025-02-06 05:04:28 +11:00
Simon Cooper
7f4ce9c8f8
[8.17] Update transport and index version id numbers to S_PP (#121380) (#121522)
Backport #121380 to 8.17
2025-02-03 13:54:28 +00:00
David Turner
c34afe022b
Cheaper snapshot-related toString() impls (#121283) (#121307)
If the `MasterService` needs to log a create-snapshot task description
then it will call `CreateSnapshotTask#toString`, which today calls
`RepositoryData#toString` which is not overridden so ends up calling
`RepositoryData#hashCode`. This can be extraordinarily expensive in a
large repository. Worse, if there's masses of create-snapshot tasks to
execute then it'll do this repeatedly, because each one only ends up
yielding a short hex string so we don't reach the description length
limit very easily.

With this commit we provide a more efficient implementation of
`CreateSnapshotTask#toString` and also override
`RepositoryData#toString` to protect against some other caller running
into the same issue.
2025-01-31 04:15:13 +11:00
Panagiotis Bailis
f59a179613
[8.17] backporting fix for negative scores in text_similarity_ranker retriever (#121057) 2025-01-29 04:30:20 +11:00
Panagiotis Bailis
4f9e040031
[8.17] backporting support for deprecated window_size param for rank rrf (#120938) 2025-01-28 12:13:00 +02:00
Aurélien FOUCRET
bc5683adee
[8.17] LTR - Fix explain failure when index has multiple shards (#120717) (#120793)
* LTR - Fix explain failure when index has multiple shards  (#120717)

* Fix test failing in 8.x branch.
2025-01-25 07:26:01 +11:00
elasticsearchmachine
d977a52f59 Bump versions after 8.16.3 release 2025-01-21 16:31:52 +00:00
elasticsearchmachine
e31bb7daf8 Bump versions after 8.17.1 release 2025-01-21 16:13:55 +00:00
Jim Ferenczi
85f8a64177
Use approximation to advance matched queries (#120133) (#120146)
This PR resolves a regression introduced in #94564 by ensuring that the approximation is used when advancing matched query clauses.
Utilizing the two-phase iterator to validate matches guarantees that we do not attempt to find the next document fulfilling the two-phase criteria beyond the current document.
This fix prevents scenarios where matching a document in the second phase significantly increases query complexity, especially in cases involving restrictive second-pass filters.

Closes #120130
2025-01-15 08:56:19 +11:00
elasticsearchmachine
a0474778e1 Bump versions after 7.17.27 release 2025-01-14 19:47:51 +00:00
Ignacio Vera
89fe46eb05
[8.17] Fix potential file leak in ES816BinaryQuantizedVectorsWriter (#120014) (#120090)
We are creating tmp files that might not get closed if an exception happens just after it. This commit makes sure all
errors are handle properly and files are getting closed and deleted.
# Conflicts:
#	muted-tests.yml
2025-01-14 20:07:42 +11:00