Missed a spot here when moving this to delayed deserialization, we can leak pending batch results here on exceptions.
closes#126994closes#126995closes#126975closes#126999closes#127001closes#126974closes#127008
We addressed the empty top docs issue with #126385 specifically for scenarios where
empty top docs don't go through the wire. Yet they may be serialized from data node
back to the coord node, in which case they will no longer be equal to Lucene#EMPTY_TOP_DOCS.
This commit expands the existing filtering of empty top docs to include also those that
did go through serialization.
Closes#126742
These tests had the potential to fail when subsequent requests would hit
different nodes with different versions of the cluster state.
Only one of these tests failed already, but we fix the other ones
proactively to avoid future failures.
Fixes#126746
The following order of events was possible:
- An ILM policy update cleared `cachedSteps`
- ILM retrieves the step definition for an index, this populates `cachedSteps` with the outdated policy
- The updated policy is put in `lifecyclePolicyMap`
Any subsequent cache retrievals will see the old step definition.
By clearing `cachedSteps` _after_ we update `lifecyclePolicyMap`, we
ensure eventual consistency between the policy and the cache.
Fixes#118406
Catching Exception instead of AmazonClientException in copyBlob and
executeMultipart led to failures in S3RepositoryAnalysisRestIT due to
the injected exceptions getting wrapped in IOExceptions that prevented
them from being caught and handled in BlobAnalyzeAction.
Closes#126576
With the addition of copy coverage in the repository analyzer,
blob count is no longer 1:1 with blob analyzer request count: requests
that create a copy count as two blobs. This can cause
testFailsOnWriteException to sometimes fail, because this test randomly
injects a failure somewhere between the first and blobCounth request,
which may never happen if enough of the requests create copies.
This simple fix is to inject the failure within blobCount/2 requests,
which we will see even if every request generates a copy. An alternative
could be to add a knob to the request to disallow copies and use that
during this test.
Closes#126747
Recently we changed the implementation of
`testDataStreamLifecycleDownsampleRollingRestart` to use a temporary
state listener. We missed that the listener also had a timeout that was
quite shorter than the `safeGet` timeout we were configuring. In this PR
we align these two timeouts.
Fixes: #123769
I suspect the test resets/closes the reference manager
between the refresh and the retrieval of the segment
generation after the refresh.
By executing segmentGenerationAfterRefresh while
holding the engine reset lock we make sure there
are no concurrent engine resets meanwhile.
In the future, we should also ensure that
IndexShard.refresh() uses withEngine.
Closes#126628