We duplicated these docs in order to avoid breaking older links, but
this makes it confusing and hard to link to the right copy of the
information. This commit removes the duplication by replacing the docs
at the old locations with stubs that link to the new locations.
Clarify that it's best to analyse the captures alongside the node logs,
and spell out in a bit more detail how to use packet captures and logs
to pin down the cause of a `disconnected` node.
We don't expect a cluster to run with `yellow` health for an extended
period of time, but it's not clear from these docs that it's important
to bring the cluster back to `green` health ASAP. This commit clarifies
these docs.
* (+Doc) Recover from "no_valid_shard_copy"
👋 @shainaraskas @DaveCTurner @anniegale9538 as follow-up to https://github.com/elastic/elasticsearch/pull/108263, this fixes the now targeted doc to make the recovery options look like alternatives rather than sequential steps.
* Apply suggestions from code review
Co-authored-by: Ievgen Degtiarenko <ievgen.degtiarenko@elastic.co>
---------
Co-authored-by: Ievgen Degtiarenko <ievgen.degtiarenko@elastic.co>
Since #104614 the top-level repo troubleshooting page is just a short
paragraph which talks about "this page" but in fact refers to
information spread across a number of subsequent pages. It's not obvious
to the reader that they need to use the navigation menu to get to the
information they seek. Moreover we link to this page from an exception
message today so there's a reasonable chance that users will find it
when trying to troubleshoot a genuine problem.
This commit rewords things slightly and adds links to the subsequent
pages to the body of the page to avoid this confusion.
* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
This PR extends the repository integrity health indicator to cover also unknown and invalid repositories. Because these errors are local to a node, we extend the `LocalHealthMonitor` to monitor the repositories and report the changes in their health regarding the unknown or invalid status.
To simplify this extension in the future, we introduce the `HealthTracker` abstract class that can be used to create new local health checks.
Furthermore, we change the severity of the health status when the repository integrity indicator reports unhealthy from `RED` to `YELLOW` because even though this is a serious issue, there is no user impact yet.
If a file header is corrupted then the exception may be reported as a
bad index format version rather than a checksum mismatch. This commit
adjusts the docs to cover this case.
These docs talk about files whose contents are unexpected, but we should
also mention that files which are completely missing are also going to
be due to infrastructural problems.
Removes the recommendations to use the object field type and to set index: false.
Both of these options are not effective with avoiding mapping explosions.
* [DOCS] Add 'Troubleshooting an unstable cluster' to nav
* Adjust docs links in code
* Revert "Adjust docs links in code"
This reverts commit f3846b1d78.
---------
Co-authored-by: David Turner <david.turner@elastic.co>
Today the `current_node` parameter is given in several sample requests
illustrating how to explain an unassigned shard using the cluster
allocation explain API. This doesn't make sense, an unassigned shard has
no `current_node`. This commit removes the misleading parameter in these
cases.
Discovery, like cluster membership, can also be affected by network-like
issues (e.g. GC/VM pauses, dropped packets and blocked threads) so this
commit duplicates the troubleshooting info across both places.
The `high-jvm-memory-pressure.html` troubleshooting docs give some
suggestions, but vitally they omit the advice to capture a heap dump
which is what we really need users to do if they want to understand
their high heap usage. This commit adds a note to the docs to that
effect.
* [DOC] Troubleshooting Expensive Searches
👋 re: https://github.com/elastic/elasticsearch/issues/73222 adds in content so we can link to users on how to find source of expensive searches.
* Several edits
* Apply suggestions from code review
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Introduce max headroom settings for the low, high, and flood disk watermark stages, similar to the existing max headroom setting for the flood stage of the frozen tier. Introduce new max headrooms in HealthMetadata and in ReactiveStorageDeciderService. Add multiple tests in DiskThresholdDeciderUnitTests, DiskThresholdDeciderTests and DiskThresholdMonitorTests. Moreover, addition & subtraction for ByteSizeValue, and min.
This troubleshooting guide is what will be returned from the SLM health indicator
when a SLM policy has suffered from too many repeat failures without a successful
execution.
Adds some docs giving more detailed background about what data
corruption really means and some suggestions about how to narrow down
the root cause.
Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>