* Remove translog from bwc testRecovery (#101068)
When the test was trying to test recovering translog ops,
since we flush on close/shutdown, it failed because it never
recovered any translog ops.
The code for translog recovery is irrelevant due to that and
this PR proposes to remove it.
Alternatively, we could simulate killing nodes forcibly before
upgrading, but (a) that seems out of the ordinary for upgrades,
and (b) in trying that, it did not consistently pass the test
because sometimes the flush on close still happened.
Fixes#52031
* Fix
* `WaitForSnapshotStep` verifies if the index belongs to the latest snapshot of that SLM policy (#100911)
The `WaitForSnapshotStep` used to check if the SLM policy has been
executed after the index has entered the delete phase, but it did not
check if the SLM policy included this index.
The result of this is that if the user used an SLM policy that did not
include this index, when the index would enter the
`WaitForSnapshotStep`, it would wait for a snapshot to be taken, a
snapshot that would not include the index, and then ILM would delete the
index.
See the exact reproduction path:
https://github.com/elastic/elasticsearch/issues/57809
**Solution** This PR, after it finds a successful SLM run, it verifies
if the snapshot taken by SLM contains this index. If not it throws an
error, otherwise it proceeds.
ILM explain will report:
```
"step_info": {
"type": "illegal_state_exception",
"reason": "the last successful snapshot of policy 'hourly-snapshots' does not include index '.ds-my-other-stream-2023.10.16-000001'"
}
```
**Backwards compatibility concerns** In this PR, the
`WaitForSnapshotStep` changed from `ClusterStateWaitStep` to
`AsyncWaitStep`. We do not think this is gonna cause an issue. This was
tested manually by the following steps: - Run a master node with the old
version. - When ILM is executing `wait-for-snapshot`, we shutdown the
node - We start the node again with the new version os ES - ES was able
to pick up the step and continue with the new code.
We believe that this covers bwc concerns.
Fixes: https://github.com/elastic/elasticsearch/issues/57809
(cherry picked from commit 5697fcf594)
Refactor testRerouteRecovery, pulling out testing of shard recovery
throttling into separate targeted tests. Now there are two additional
tests, one testing source node throttling, and another testing target
node throttling. Throttling both nodes at once leads to primarily the
source node registering throttling, while the target node mostly has
no cause to instigate throttling.
(cherry picked from commit 323d9366df)
* Update gradle wrapper to 8.3 (#97838)
Gradle now fully supports compiling, testing and running on Java 20.
Among other general performance improvements this release introduces --test-dry-run command line option that allows checking if tests are filtered or not by gradle.
Required updating nebula ospackage plugin as setuid was broken in gradle 8.3.
(cherry picked from commit b23e000c30)
# Conflicts:
# build-tools-internal/src/integTest/groovy/org/elasticsearch/gradle/internal/test/rest/LegacyYamlRestCompatTestPluginFuncTest.groovy
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/ElasticsearchJavaModulePathPlugin.java
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/test/rest/compat/compat/AbstractYamlRestCompatTestPlugin.java
# build-tools-internal/src/main/resources/minimumGradleVersion
# gradle/verification-metadata.xml
# gradle/wrapper/gradle-wrapper.jar
# gradlew
# x-pack/plugin/watcher/qa/with-monitoring/src/javaRestTest/java/org/elasticsearch/smoketest/MonitoringWithWatcherRestIT.java
* [7.17] Use patched nebula os package gradle plugin
* Update testingconvention precommit integ test
* Log a debug level message for deleting non-existing snapshot (#100479)
The new message helps pairing with the "deleting snapshots" log message
at info level.
(cherry picked from commit 2cfdb7a92d)
# Conflicts:
# server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java
* spotless
* compilation
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Test
`NodeConnectionsServiceTests/testEventuallyConnectsOnlyToAppliedNodes`
fails with
```
java.lang.AssertionError: not connected to {node_21}{21}{Smg5SSzlSAWdhBrP63KjTQ}{0.0.0.0}{0.0.0.0:7}{dmsw}
at __randomizedtesting.SeedInfo.seed([963D3EFA5F943C6E:6D1F74C5990D6DDD]:0)
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.elasticsearch.cluster.NodeConnectionsServiceTests.assertConnected(NodeConnectionsServiceTests.java:507)
at org.elasticsearch.cluster.NodeConnectionsServiceTests.assertConnectedExactlyToNodes(NodeConnectionsServiceTests.java:501)
at org.elasticsearch.cluster.NodeConnectionsServiceTests.assertConnectedExactlyToNodes(NodeConnectionsServiceTests.java:497)
at org.elasticsearch.cluster.NodeConnectionsServiceTests.lambda$testEventuallyConnectsOnlyToAppliedNodes$6(NodeConnectionsServiceTests.java:152)
at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1143)
at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1116)
```
Mute it.
Relates #100493
* [ML] Defend against negative datafeed start times (#100284)
A negative start time in the datafeed can cause significant disruption
to an entire cluster. This PR checks that the start time is greater
than or equal to 0 and throws an exception otherwise.
* Adjust backported test for 7.17
This PR adds a validation step to the end of an enrich policy run to ensure the integrity of the
enrich index that is about to be promoted.
(cherry picked from commit 225db3190a)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
There should be NullPointerException check and throw index not found exception to the response
so the user can understand what happens with the enrich index
---------
Co-authored-by: James Baiera <james.baiera@gmail.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit ccc896d128)
# Conflicts:
# x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichCache.java
# x-pack/plugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/EnrichCacheTests.java
Co-authored-by: puppylpg <shininglhb@163.com>
The previous docker compose plugin version trips on
the following docker-compose cli version
docker-compose version --short
2.22.0-desktop.2
(because of the trailing "desktop.2" part),
which is what's installed with the latest Docker Desktop on macOS.
Backport of #100059
* Update Gradle Wrapper to 8.2 (#96686)
- Convention usage has been deprecated and was fixed in our build files
- Fix test dependencies and deprecation
In a production cluster, I observed the `[scheduler]` thread stuck for a
while trying to delete index files that became unreferenced while
closing a search context. We shouldn't be doing I/O on the scheduler
thread. This commit moves it to a `SEARCH` thread instead.
The invalidateAll method is taking out the lru lock and segment locks in a different order to the put method, when the put is replacing an existing value. This results in a deadlock between the two methods as they try to swap locks. This fixes it by making sure invalidateAll takes out locks in the same order as put.
This is difficult to test because the put needs to be replacing an existing value, and invalidateAll clears the cache, resulting in subsequent puts not hitting the deadlock condition. A test that overrides some internal implementations to expose this particular deadlock will be coming later.