This supports local testing. It should not be included in
hardening_manifest.yml, which injects the scope at runtime.
# Conflicts:
# distribution/docker/src/docker/Dockerfile
Co-authored-by: Jon <jon@elastic.co>
This PR upgrades the version of reactor-netty-http library to the latest v1.0.39 version
and its transitive dependencies reactor-core to v3.4.34 and reactor-netty-core to v1.0.39.
Backport of #102311
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Improve painless error wrapping (#100872)
Painless sandboxes some errors from Java for which it can recover. These
errors are wrapped within a ScriptException. However, retaining the
error as a cause can be confusing when walking the error chain. This
commit wraps the error so that the real error type does not appear,
but maintains the same error message in xcontent serialized form.
* fix compile
* Unwrap exception more tenaciously in testQueuedOperationsAndBrokenRepoOnMasterFailOver (#102352)
There can be more than 10 layers of wrapping RTEs, see #102351. As a
workaround to address the test failure, this commit just manually
unwraps them all.
Closes#102348
* Fixup
A call to `ConnectionTarget#connect` which happens strictly after all
calls that close connections should leave us connected to the target.
However concurrent calls to `ConnectionTarget#connect` can overlap, and
today this means that a connection returned from an earlier call may
overwrite one from a later call. The trouble is that the earlier
connection attempt may yield a closed connection (it was concurrent with
the disconnections) so we must not let it supersede the newer one.
With this commit we prevent concurrent connection attempts, which avoids
earlier attempts from overwriting the connections resulting from later
attempts.
Backport of #92558
When combined with #101910, closes#100493
Today we call `Transport.Connection#onRemoved`, notifying any
removed-listeners, when the connection is closed and removed from the
`connectedNodes` map. However, it's possible for the connection to be
closed while we're still adding it to the map and setting up the
listeners, so this now-dead connection will still be found in the
`pendingConnections` and may be returned to a future call to
`connectToNode` even if this call was made after all the
removed-listeners have been called.
With this commit we delay calling the removed-listeners until the
connection is closed and removed from both the `connectedNodes` and
`pendingConnections` maps.
Backport of #92546 to 7.17
Relates #100493
Today `TcpTransport#openConnection` may throw exceptions on certain
kinds of failure, but other kinds of failure are passed to the listener.
This is trappy and not all callers handle it correctly. This commit
makes sure that all exceptions are passed to the listener.
Closes#100510
These tests were muted both at the suite level as well as at the test level
for reasons I don't fully understand, and then were unmuted at one level
but not the other. They don't appear to fail after a few thousand runs,
so this PR unmutes them the rest of the way.
* Include branch information in build scans for buildkite jobs (#101284)
# Conflicts:
# build-tools-internal/src/main/groovy/elasticsearch.build-scan.gradle
* Align with other branches
* Remove translog from bwc testRecovery (#101068)
When the test was trying to test recovering translog ops,
since we flush on close/shutdown, it failed because it never
recovered any translog ops.
The code for translog recovery is irrelevant due to that and
this PR proposes to remove it.
Alternatively, we could simulate killing nodes forcibly before
upgrading, but (a) that seems out of the ordinary for upgrades,
and (b) in trying that, it did not consistently pass the test
because sometimes the flush on close still happened.
Fixes#52031
* Fix
* `WaitForSnapshotStep` verifies if the index belongs to the latest snapshot of that SLM policy (#100911)
The `WaitForSnapshotStep` used to check if the SLM policy has been
executed after the index has entered the delete phase, but it did not
check if the SLM policy included this index.
The result of this is that if the user used an SLM policy that did not
include this index, when the index would enter the
`WaitForSnapshotStep`, it would wait for a snapshot to be taken, a
snapshot that would not include the index, and then ILM would delete the
index.
See the exact reproduction path:
https://github.com/elastic/elasticsearch/issues/57809
**Solution** This PR, after it finds a successful SLM run, it verifies
if the snapshot taken by SLM contains this index. If not it throws an
error, otherwise it proceeds.
ILM explain will report:
```
"step_info": {
"type": "illegal_state_exception",
"reason": "the last successful snapshot of policy 'hourly-snapshots' does not include index '.ds-my-other-stream-2023.10.16-000001'"
}
```
**Backwards compatibility concerns** In this PR, the
`WaitForSnapshotStep` changed from `ClusterStateWaitStep` to
`AsyncWaitStep`. We do not think this is gonna cause an issue. This was
tested manually by the following steps: - Run a master node with the old
version. - When ILM is executing `wait-for-snapshot`, we shutdown the
node - We start the node again with the new version os ES - ES was able
to pick up the step and continue with the new code.
We believe that this covers bwc concerns.
Fixes: https://github.com/elastic/elasticsearch/issues/57809
(cherry picked from commit 5697fcf594)