Commit graph

62 commits

Author SHA1 Message Date
Stef Nestor
d039c280af
(Docs+) Flush out Resource+Task troubleshooting (#111773) (#112818)
* (Docs+) Flush out Resource+Task troubleshooting

---------

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: David Turner <david.turner@elastic.co>
2024-09-13 00:09:58 +10:00
David Turner
9387ce3357
Deduplicate unstable-cluster troubleshooting docs (#112333)
We duplicated these docs in order to avoid breaking older links, but
this makes it confusing and hard to link to the right copy of the
information. This commit removes the duplication by replacing the docs
at the old locations with stubs that link to the new locations.
2024-08-29 13:16:37 +01:00
Stef Nestor
0dab4b0571
(Doc+) Removing "current_node" from Allocation Explain API under Fix Watermark Errors (#111946)
👋 howdy, team!

This just simplifies the Allocation Explain API request to not need to include the `current_node` which may not be known when troubleshooting the [Fix Watermark Errors](https://www.elastic.co/guide/en/elasticsearch/reference/current/fix-watermark-errors.html) guide. 

TIA!
Stef
2024-08-20 08:22:22 -06:00
David Turner
e5fd63bbb8
More detail around packet captures (#111835)
Clarify that it's best to analyse the captures alongside the node logs,
and spell out in a bit more detail how to use packet captures and logs
to pin down the cause of a `disconnected` node.
2024-08-13 21:55:38 +01:00
Stef Nestor
0a850548f5
Add link to flood-stage watermark exception message (#111315)
Links the exception when hitting the flood-stage watermark to docs about this
watermark and how to troubleshoot and resolve the problem.
2024-08-06 17:15:42 +01:00
David Turner
efd450ee19
Clarify that red/yellow health must be addressed (#109090)
We don't expect a cluster to run with `yellow` health for an extended
period of time, but it's not clear from these docs that it's important
to bring the cluster back to `green` health ASAP. This commit clarifies
these docs.
2024-05-28 11:43:54 -04:00
David Turner
42e5293c04
Capture GC logs alongside heap dumps (#109087)
GC logs can be important to understand a heap dump, especially if
there's lots of unreachable objects and the GC is struggling to keep up.
2024-05-28 04:54:04 -04:00
Stef Nestor
4ab4d8727f
(+Doc) Recover from "no_valid_shard_copy" (#108929)
* (+Doc) Recover from "no_valid_shard_copy"

👋 @shainaraskas @DaveCTurner @anniegale9538  as follow-up to https://github.com/elastic/elasticsearch/pull/108263, this fixes the now targeted doc to make the recovery options look like alternatives rather than sequential steps.

* Apply suggestions from code review

Co-authored-by: Ievgen Degtiarenko <ievgen.degtiarenko@elastic.co>

---------

Co-authored-by: Ievgen Degtiarenko <ievgen.degtiarenko@elastic.co>
2024-05-23 11:13:07 -06:00
Stef Nestor
1a55e2fa76
(Doc+) Capture Elasticsearch diagnostic (#108259)
* (Doc+) Capture Elasticsearch diagnostic

* add diagnostic topic to nav, chunk content, style edits

* fix test

---------

Co-authored-by: shainaraskas <shaina.raskas@elastic.co>
2024-05-09 10:27:19 -06:00
David Turner
9adf2422df
Add links to repo troubleshooting sub-pages (#107604)
Since #104614 the top-level repo troubleshooting page is just a short
paragraph which talks about "this page" but in fact refers to
information spread across a number of subsequent pages. It's not obvious
to the reader that they need to use the navigation menu to get to the
information they seek. Moreover we link to this page from an exception
message today so there's a reasonable chance that users will find it
when trying to troubleshoot a genuine problem.

This commit rewords things slightly and adds links to the subsequent
pages to the body of the page to avoid this confusion.
2024-04-18 12:24:45 +01:00
Liam Thompson
33a71e3289
[DOCS] Refactor book-scoped variables in docs/reference/index.asciidoc (#107413)
* Remove `es-test-dir` book-scoped variable

* Remove `plugins-examples-dir` book-scoped variable

* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables

- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem

* Replace `es-repo-dir` with `es-ref-dir`

* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
2024-04-17 14:37:07 +02:00
Yang Wang
c0476c1efb
Trivial typo fix for #105774 (#106471)
As the title says.
2024-03-19 13:48:36 +01:00
Ievgen Degtiarenko
12299b89d0
Troubleshooting unbalanced cluster docs (#105774)
This adds initial page with explanation on balancing approach
and steps to troubleshoot it.
2024-03-14 14:10:13 +01:00
Stef Nestor
37542e6245
(Doc+) Link Troubleshooting Discover from Mapping Explosion (#105991)
👋 howdy team! [Mapping Explosion](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-explosion.html) is a common root issue of [Discover Slowness](https://www.elastic.co/blog/troubleshooting-guide-common-issues-kibana-discover-load), so cross-linking these Dev-reviewed pages.
2024-03-11 15:00:32 +01:00
Niels Bauman
64891011d3
Extend repository_integrity health indicator for unknown and invalid repos (#104614)
This PR extends the repository integrity health indicator to cover also unknown and invalid repositories. Because these errors are local to a node, we extend the `LocalHealthMonitor` to monitor the repositories and report the changes in their health regarding the unknown or invalid status.
To simplify this extension in the future, we introduce the `HealthTracker` abstract class that can be used to create new local health checks.
Furthermore, we change the severity of the health status when the repository integrity indicator reports unhealthy from `RED` to `YELLOW` because even though this is a serious issue, there is no user impact yet.
2024-02-07 15:18:55 +01:00
Felix Barnsteiner
f642b8a3aa
Add setting to ignore dynamic fields when field limit is reached (#96235)
Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit`
index setting.

When set to `true`, new fields are added to the mapping as long as the
field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields
that would exceed the limit are not added to the mapping, similar to
`dynamic: false`.  Ignored fields are added to the `_ignored` metadata
field.

Relates to https://github.com/elastic/elasticsearch/issues/89911

To make this easier to review, this is split into the following PRs: -
[x] https://github.com/elastic/elasticsearch/pull/102915 - [x]
https://github.com/elastic/elasticsearch/pull/102936 - [x]
https://github.com/elastic/elasticsearch/pull/104769

Related but not a prerequisite: - [ ]
https://github.com/elastic/elasticsearch/pull/102885
2024-02-02 05:53:52 -05:00
David Turner
312d4c2fa1
Mention IndexFormatToo{Old,New}Exception as corruption (#104204)
If a file header is corrupted then the exception may be reported as a
bad index format version rather than a checksum mismatch. This commit
adjusts the docs to cover this case.
2024-01-10 08:54:07 -05:00
David Turner
cf6632c3bd
Mention missing files in corruption troubleshooting docs (#103962)
These docs talk about files whose contents are unexpected, but we should
also mention that files which are completely missing are also going to
be due to infrastructural problems.
2024-01-05 05:42:59 -05:00
David Turner
92eae448e9
Clarify that we need stack dumps of the main process (#103391)
ES comprises more than one Java process, but it's the main one which
matters when looking at stack dumps.
2023-12-13 08:41:30 -05:00
David Turner
5dff56a00e
Mention network handler logging in docs (#100118)
Mentions the `InboundHandler` (and `OutboundHandler`) as potential
sources of useful log messages when tracking down a network threading
bug.
2023-10-02 08:52:16 +01:00
David Turner
bf34036c8c
Discovery troubleshooting next steps (#99743)
Adds a little more detail on how to react if you see evidence that the
Elasticsearch process is pausing for a long time due to long GCs or VM
pauses.
2023-09-21 13:00:13 +01:00
Felix Barnsteiner
ebd5ead943
Remove ineffective options of preventing mapping explosions (#99665)
Removes the recommendations to use the object field type and to set index: false.
Both of these options are not effective with avoiding mapping explosions.
2023-09-20 13:59:03 +02:00
Abdon Pijpelink
af76a3a436
[DOCS] Add 'Troubleshooting an unstable cluster' to nav (#99287)
* [DOCS] Add 'Troubleshooting an unstable cluster' to nav

* Adjust docs links in code

* Revert "Adjust docs links in code"

This reverts commit f3846b1d78.

---------

Co-authored-by: David Turner <david.turner@elastic.co>
2023-09-08 13:42:50 +02:00
Stef Nestor
0781bafac1
[DOC+][Hot Spotting] Pull detailed Node Tasks (#98879)
Co-authored-by: David Turner <david.turner@elastic.co>
2023-08-29 14:25:10 -04:00
David Turner
ddd4ba5e30
Fix docs for explaining unassigned shards (#97538)
Today the `current_node` parameter is given in several sample requests
illustrating how to explain an unassigned shard using the cluster
allocation explain API. This doesn't make sense, an unassigned shard has
no `current_node`. This commit removes the misleading parameter in these
cases.
2023-07-11 08:01:12 +01:00
David Turner
09e53f9ad9
Enhance docs around network troubleshooting (#97305)
Discovery, like cluster membership, can also be affected by network-like
issues (e.g. GC/VM pauses, dropped packets and blocked threads) so this
commit duplicates the troubleshooting info across both places.
2023-07-10 10:57:44 +01:00
David Turner
846d640ddf
Suggest capturing a heap dump to diagnose high heap (#96526)
The `high-jvm-memory-pressure.html` troubleshooting docs give some
suggestions, but vitally they omit the advice to capture a heap dump
which is what we really need users to do if they want to understand
their high heap usage. This commit adds a note to the docs to that
effect.
2023-06-02 09:43:52 -04:00
debadair
777598d602
[DOCS] Remove redirect pages (#88738)
* [DOCS] Remove manual redirects

* [DOCS] Removed refs to modules-discovery-hosts-providers

* [DOCS] Fixed broken internal refs

* Fixing bad cross links in ES book, and adding redirects.asciidoc[] back into docs/reference/index.asciidoc.

* Update docs/reference/search/point-in-time-api.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/setup/restart-cluster.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/sql/endpoints/translate.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/snapshot-restore/restore-snapshot.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update repository-azure.asciidoc

* Update node-tool.asciidoc

* Update repository-azure.asciidoc

---------

Co-authored-by: amyjtechwriter <61687663+amyjtechwriter@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-05-24 12:32:46 +01:00
Stef Nestor
65b4fe28d4
[+DOC] Troubleshooting / Mapping Explosion (#95397)
* [+DOC] Troubleshooting / Mapping Explosion

---------

Co-authored-by: Steffanie Nestor <steffanie.nestor@elastic.co>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
2023-04-27 11:08:56 -06:00
David Kilfoyle
626db84fac
[Docs] Small fixes for hot spotting page (#95627) 2023-04-27 10:18:21 -04:00
Stef Nestor
4c5a3fb4da
[+Doc] Troubleshooting / Hot Spotting (#95429)
* [+Doc] Troubleshooting / Hot Spotting

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-04-26 12:29:47 -06:00
Stef Nestor
1ee528dc3f
[Doc] Troubleshoot Cluster State / Linkable subsections (#95468)
👋🏼 howdy, team! Could we make these sub-sections sub-header link-able?
2023-04-25 10:35:14 +02:00
Pablo Alcantar Morales
253fe6325d
Add shards capacity troubleshooting guide (#95208) 2023-04-19 09:24:07 +02:00
David Turner
b4b9292ce9
Small changes to corruption troubleshooting docs (#95265)
- Mention that third-party software may be to blame too
- Mention `strace` as a last resort
- Minor rewordings
2023-04-17 09:07:27 +01:00
Stef Nestor
e12e83fa37
Search-Troubleshoot | Most Recent Record (#94409)
May we add a section to [this page](https://www.elastic.co/guide/en/elasticsearch/reference/master/troubleshooting-searches.html#troubleshooting-check-field-values) to query for the latest record on an index (pattern)? This will be helpful to decide between Kibana Discover filter and Elasticsearch ingest lag problems.

---------

Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-03-23 08:04:49 -06:00
Abdon Pijpelink
2808512397
[DOCS] Improve watermark troubleshooting documentation (#94222) 2023-03-01 14:34:14 +01:00
Iraklis Psaroudakis
555a4d91ee
Update add-repository.asciidoc (#92945)
Our guide on re-registering a corrupt repository should link to the warnings about the potential side-effects of corruption.
2023-01-16 17:20:21 +02:00
Stef Nestor
d9cbefc19c
[DOC] Troubleshooting Expensive Searches (#92725)
* [DOC] Troubleshooting Expensive Searches

👋 re: https://github.com/elastic/elasticsearch/issues/73222 adds in content so we can link to users on how to find source of expensive searches.

* Several edits

* Apply suggestions from code review

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-01-13 09:55:13 +01:00
Cleydyr Bezerra de Albuquerque
ee452bd143
Update circuit-breaker-errors.asciidoc (#92070)
Fix typo fieldata -> fielddata
2022-12-05 10:53:06 +01:00
Mary Gouseti
cfd23d512f
Disk indicator troubleshooting guides (#90504) 2022-10-14 15:24:21 +02:00
Ievgen Degtiarenko
4d6d979e0e
Deprecate state field in /_cluster/reroute response (#90399) 2022-10-05 08:18:27 +02:00
Iraklis Psaroudakis
34471b1cd2
Introduce max headroom for disk watermark stages (#88639)
Introduce max headroom settings for the low, high, and flood disk watermark stages, similar to the existing max headroom setting for the flood stage of the frozen tier. Introduce new max headrooms in HealthMetadata and in ReactiveStorageDeciderService. Add multiple tests in DiskThresholdDeciderUnitTests, DiskThresholdDeciderTests and DiskThresholdMonitorTests. Moreover, addition & subtraction for ByteSizeValue, and min.
2022-09-19 14:59:18 +03:00
James Baiera
db73aa0498
Add repeated snapshot failure troubleshooting guide (#89762)
This troubleshooting guide is what will be returned from the SLM health indicator 
when a SLM policy has suffered from too many repeat failures without a successful 
execution.
2022-09-15 17:01:32 -04:00
Abdon Pijpelink
346f7848e6
[DOCS] Add troubleshooting searches guide (#89583)
* [DOCS] Adds troubleshooting searches guide

* Additional troubleshooting steps

* Apply review suggestions

* Replace separate _cat aliases/indices requests with one get indices call

* Reorder steps to move field caps forward

* Add note about ignore_unavailable

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-09-08 14:30:21 +02:00
Leaf-Lin
942e5fd9fc
Adding specific items into troubleshooting guide (#88105)
* Update troubleshooting.asciidoc

Adding items into the troubleshooting guide

* Resolve conflicts

* Reorganizes troubleshooting links

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2022-08-03 17:00:34 +02:00
David Turner
74ce7a4603
Fix typo (#89063) 2022-08-03 10:23:57 +01:00
David Turner
7103053f03
Add troubleshooting docs about data corruption (#88760)
Adds some docs giving more detailed background about what data
corruption really means and some suggestions about how to narrow down
the root cause.

Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
2022-07-28 11:23:23 +01:00
Mary Gouseti
89903bbe23
Troubleshooting docs for ACTION_RESTORE_FROM_SNAPSHOT (#87692)
Troubleshooting guide to restore indices and data streams that have
missing data from a snapshot.

This will be associated with the user action
`ACTION_RESTORE_FROM_SNAPSHOT`.

Preview link:
https://elasticsearch_87692.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/restore-from-snapshot.html
2022-07-27 23:37:08 +09:30
Abdon Pijpelink
26cc87360e
Split common cluster issues page into separate pages (#88495) 2022-07-18 17:54:02 +02:00
Andrei Dan
f3431e1bff
Add troubleshooting guide for corrupt repository (#88391)
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
2022-07-14 13:37:02 +01:00