* Increase concurrent request of opening point-in-time (#96782) (#96957)
Today, we mistakenly throttle the opening point-in-time API to 1 request
per node. As a result, when attempting to open a point-in-time across
large clusters, it can take a significant amount of time and eventually
fails due to relocated target shards or deleted target indices managed
by ILM. Ideally, we should batch the requests per node and eliminate
this throttle completely. However, this requires all clusters to be on
the latest version.
This PR increases the number of concurrent requests from 1 to 5, which
is the default of search.
* Fix tests
* Fix tests
Reporting the `targetNodeName` was added to `main` in #78727 but omitted
from the backport in #78865. This commit adds the missing field to the
`toString()` response.
This test was using the wrong `DiscoveryNodes`, but that mistake was
hidden by other leniency elsewhere in this test suite. This commit fixes
the test bug and also makes the test suite stricter.
Closes#93729
(cherry picked from commit 774e396ed5)
Co-authored-by: David Turner <david.turner@elastic.co>
We continue to have CI failures for open files when trying to cleanup on
Windows. This commit tries to account for one of those cases, where the
out/err redirects are cleaned up, opting to retry once after a delay.
This action can become fairly expensive for large states. Plus it is
called at high rates on e.g. Cloud which is blocking transport threads
needlessly in large deployments. Lets fork it to MANAGEMENT like we do
for similar CPU bound actions.
Backport of #90621 to 7.17
Co-authored-by: Armin Braun <me@obrown.io>
Avoids an O(#nodes) iteration by tracking the number of fetches directly.
Backport of #93632 to 7.17
Co-authored-by: luyuncheng <luyuncheng@bytedance.com>
- Bury these docs a little more, there's no need for them to be on the landing page for "Set up Elasticsearch".
- Clarify the responsibilities for JDK updates.
The test was failing when responseDelay == leaderCheckTimeoutMillis.
This resulted in scheduling both handling the response and timeout at the same
mills and executing them in random order. The fix makes it impossible to
reply the same time as request is timeout as the behavior is not deterministic
in such case.
File based service tokens were added to support orchestration
requirements in environments such as ECE and ECK. Outside of these
environments we recommend that API based tokens are used instead.
Resolves: #83491
Our docker exclusion list is sensitive to minor versions. With SLES 15.4
we need to explicitly add it to the exclusion list since it doesn't play
nicely with Docker in CI.
Closes https://github.com/elastic/elasticsearch/issues/93898
# Conflicts:
# .ci/dockerOnLinuxExclusions
Cold cache prewarming tasks are not stopped immediately when
the shard is closed, causing excessive use of disk for nothing.
This change adds a Supplier<Boolean> to prewarming logic that
can be checked before executing any consuming operation to know
if the Store is closing.
I'm not super happy with my test changes but the logic for
checking if prewarming works correctly is tricky.
Closes#95504
The migrate action (although no allowed in the frozen phase) would seem
to convert `frozen` to `data_frozen,data_cold,data_warm,data_hot` tier
configuration. As the migrate action is not allowed in the frozen phase
this would never happen, however the code is confusing as it seems like
it could.
The migrate to data tiers routing service shared the code used by the
`migrate` action that converted `frozen` to
`data_frozen,data_cold,data_warm,data_hot` if it would encounter an
index without any `_tier_preference` setting but with a custom node
attribute configured to `frozen` e.g. `include.data: frozen`
As part of https://github.com/elastic/elasticsearch/issues/84758 we have
seen frozen indices with the `data_frozen,data_cold,data_warm,data_hot`
tier preference however we could never reproduce it.
Relates to https://github.com/elastic/elasticsearch/issues/84758
We run the same request back to back for each put-follower call during
the restore. Also, concurrent put-follower calls will all run the same
full CS request concurrently.
In older versions prior to https://github.com/elastic/elasticsearch/pull/87235
the concurrency was limited by the size of the snapshot pool. With that
fix though, they are run at almost arbitry concurrency when many
put-follow requests are executed concurrently.
-> fixed by using the existing deduplicator to only run a single remote
CS request at a time for each CCR repository.
Also, this removes the needless forking in the put-follower action that
is not necessary any longer now that we have the CCR repository
non-blocking (we do the same for normal restores that can safely be
started from a transport thread), which should fix some bad-ux
situations where the snapshot threads are busy on master, making
the put-follower requests not go through in time.
The OWASP Java HTML Sanitizer has a dependency on Guava,
specifically on version 30.1. However, prior to this PR,
we have been using 27.1. While this does not seem to have
caused any issues, we should still correct the problem.
(cherry picked from commit f2f45ad1cf)
This commit adds the ability to initialize YAML rest test suites against
a subset of available test cases. Previously, the only way to do this is
via the `tests.rest.suite` system property, but that can only be set at
the test _task_ level. Configuring this at the test _class_ level means
that we can support having multiple test suite classes that execute
subsets of tests within a project. That allows for things like
parallelization, or having different test cluster setups for different
YAML tests within the same project.
For example:
```java
@ParametersFactory
public static Iterable<Object[]> parameters() throws Exception {
return ESClientYamlSuiteTestCase.createParameters(new String[] { "analysis-common", "indices.analyze" });
}
```
The above example would mean that only tests in the `analysis-common`
and `indices.analyze` directories would be included in this suite.
cc @jdconrad
Closes#95089
When parsing role descriptors, we ensure that the FieldPermissions
(`"field_security":{ "grant":[ ... ], "except":[ ... ] }`) are valid -
that is that any patterns compile correctly, and the "except" is a
subset of the "grant".
However, the previous implementation would not use the
FieldPermissionsCache for this, so it would compile (union, intersect
& minimize) automatons every time a role was parsed.
This was particularly an issue when parsing roles (from the security
index) in the GET /_security/role/ endpoint. If there were a large
number of roles with field level security the automaton parsing could
have significant impact on the performance of this API.
Backport of: #94931
* [DOCS] clarify v7 file realm configuration
* Update x-pack/docs/en/security/authentication/configuring-file-realm.asciidoc
Co-authored-by: Yang Wang <ywangd@gmail.com>
---------
Co-authored-by: Yang Wang <ywangd@gmail.com>
This PR ups the timeout on the EnrichExecutor's task API call and adds additional logic in the event
that the task await call fails. Without this change, the task API call can timeout and unlock the policy
prematurely. Premature unlocking can lead to the index being removed while the policy is executing.
(cherry picked from commit f56fc01222)
# Conflicts:
# x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyExecutor.java
# x-pack/plugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/EnrichPolicyExecutorTests.java
This PR refactors the locking logic for enrich policies so that enrich index names are resolved early
so that they may be explicitly protected from maintenance tasks on the master node. The
maintenance service has been optimized to allow for concurrent removal of old enrich indices while
policies are executing. Further concurrency changes were made to improve the thread safety of the
system (such as removing the double check locking in maintenance and the ability to unlock
policies from code that does not hold the lock).
(cherry picked from commit 998520e111)