* Unwrap exception more tenaciously in testQueuedOperationsAndBrokenRepoOnMasterFailOver (#102352)
There can be more than 10 layers of wrapping RTEs, see #102351. As a
workaround to address the test failure, this commit just manually
unwraps them all.
Closes#102348
* Fixup
A call to `ConnectionTarget#connect` which happens strictly after all
calls that close connections should leave us connected to the target.
However concurrent calls to `ConnectionTarget#connect` can overlap, and
today this means that a connection returned from an earlier call may
overwrite one from a later call. The trouble is that the earlier
connection attempt may yield a closed connection (it was concurrent with
the disconnections) so we must not let it supersede the newer one.
With this commit we prevent concurrent connection attempts, which avoids
earlier attempts from overwriting the connections resulting from later
attempts.
Backport of #92558
When combined with #101910, closes#100493
Today we call `Transport.Connection#onRemoved`, notifying any
removed-listeners, when the connection is closed and removed from the
`connectedNodes` map. However, it's possible for the connection to be
closed while we're still adding it to the map and setting up the
listeners, so this now-dead connection will still be found in the
`pendingConnections` and may be returned to a future call to
`connectToNode` even if this call was made after all the
removed-listeners have been called.
With this commit we delay calling the removed-listeners until the
connection is closed and removed from both the `connectedNodes` and
`pendingConnections` maps.
Backport of #92546 to 7.17
Relates #100493
Today `TcpTransport#openConnection` may throw exceptions on certain
kinds of failure, but other kinds of failure are passed to the listener.
This is trappy and not all callers handle it correctly. This commit
makes sure that all exceptions are passed to the listener.
Closes#100510
These tests were muted both at the suite level as well as at the test level
for reasons I don't fully understand, and then were unmuted at one level
but not the other. They don't appear to fail after a few thousand runs,
so this PR unmutes them the rest of the way.
* Include branch information in build scans for buildkite jobs (#101284)
# Conflicts:
# build-tools-internal/src/main/groovy/elasticsearch.build-scan.gradle
* Align with other branches
* Remove translog from bwc testRecovery (#101068)
When the test was trying to test recovering translog ops,
since we flush on close/shutdown, it failed because it never
recovered any translog ops.
The code for translog recovery is irrelevant due to that and
this PR proposes to remove it.
Alternatively, we could simulate killing nodes forcibly before
upgrading, but (a) that seems out of the ordinary for upgrades,
and (b) in trying that, it did not consistently pass the test
because sometimes the flush on close still happened.
Fixes#52031
* Fix
* `WaitForSnapshotStep` verifies if the index belongs to the latest snapshot of that SLM policy (#100911)
The `WaitForSnapshotStep` used to check if the SLM policy has been
executed after the index has entered the delete phase, but it did not
check if the SLM policy included this index.
The result of this is that if the user used an SLM policy that did not
include this index, when the index would enter the
`WaitForSnapshotStep`, it would wait for a snapshot to be taken, a
snapshot that would not include the index, and then ILM would delete the
index.
See the exact reproduction path:
https://github.com/elastic/elasticsearch/issues/57809
**Solution** This PR, after it finds a successful SLM run, it verifies
if the snapshot taken by SLM contains this index. If not it throws an
error, otherwise it proceeds.
ILM explain will report:
```
"step_info": {
"type": "illegal_state_exception",
"reason": "the last successful snapshot of policy 'hourly-snapshots' does not include index '.ds-my-other-stream-2023.10.16-000001'"
}
```
**Backwards compatibility concerns** In this PR, the
`WaitForSnapshotStep` changed from `ClusterStateWaitStep` to
`AsyncWaitStep`. We do not think this is gonna cause an issue. This was
tested manually by the following steps: - Run a master node with the old
version. - When ILM is executing `wait-for-snapshot`, we shutdown the
node - We start the node again with the new version os ES - ES was able
to pick up the step and continue with the new code.
We believe that this covers bwc concerns.
Fixes: https://github.com/elastic/elasticsearch/issues/57809
(cherry picked from commit 5697fcf594)
Refactor testRerouteRecovery, pulling out testing of shard recovery
throttling into separate targeted tests. Now there are two additional
tests, one testing source node throttling, and another testing target
node throttling. Throttling both nodes at once leads to primarily the
source node registering throttling, while the target node mostly has
no cause to instigate throttling.
(cherry picked from commit 323d9366df)
* Update gradle wrapper to 8.3 (#97838)
Gradle now fully supports compiling, testing and running on Java 20.
Among other general performance improvements this release introduces --test-dry-run command line option that allows checking if tests are filtered or not by gradle.
Required updating nebula ospackage plugin as setuid was broken in gradle 8.3.
(cherry picked from commit b23e000c30)
# Conflicts:
# build-tools-internal/src/integTest/groovy/org/elasticsearch/gradle/internal/test/rest/LegacyYamlRestCompatTestPluginFuncTest.groovy
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/ElasticsearchJavaModulePathPlugin.java
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/test/rest/compat/compat/AbstractYamlRestCompatTestPlugin.java
# build-tools-internal/src/main/resources/minimumGradleVersion
# gradle/verification-metadata.xml
# gradle/wrapper/gradle-wrapper.jar
# gradlew
# x-pack/plugin/watcher/qa/with-monitoring/src/javaRestTest/java/org/elasticsearch/smoketest/MonitoringWithWatcherRestIT.java
* [7.17] Use patched nebula os package gradle plugin
* Update testingconvention precommit integ test
* Log a debug level message for deleting non-existing snapshot (#100479)
The new message helps pairing with the "deleting snapshots" log message
at info level.
(cherry picked from commit 2cfdb7a92d)
# Conflicts:
# server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java
* spotless
* compilation
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>