mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
[DOCS] Add 'Troubleshooting an unstable cluster' to nav (#99287)
* [DOCS] Add 'Troubleshooting an unstable cluster' to nav
* Adjust docs links in code
* Revert "Adjust docs links in code"
This reverts commit f3846b1d78
.
---------
Co-authored-by: David Turner <david.turner@elastic.co>
This commit is contained in:
parent
f5871af929
commit
af76a3a436
3 changed files with 14 additions and 3 deletions
|
@ -35,7 +35,7 @@ starting from the beginning of the cluster state update. Refer to
|
||||||
|
|
||||||
[[cluster-fault-detection-troubleshooting]]
|
[[cluster-fault-detection-troubleshooting]]
|
||||||
==== Troubleshooting an unstable cluster
|
==== Troubleshooting an unstable cluster
|
||||||
|
//tag::troubleshooting[]
|
||||||
Normally, a node will only leave a cluster if deliberately shut down. If a node
|
Normally, a node will only leave a cluster if deliberately shut down. If a node
|
||||||
leaves the cluster unexpectedly, it's important to address the cause. A cluster
|
leaves the cluster unexpectedly, it's important to address the cause. A cluster
|
||||||
in which nodes leave unexpectedly is unstable and can create several issues.
|
in which nodes leave unexpectedly is unstable and can create several issues.
|
||||||
|
@ -143,6 +143,7 @@ removes the node removed after three consecutively failed health checks. Refer
|
||||||
to <<modules-discovery-settings>> for information about the settings which
|
to <<modules-discovery-settings>> for information about the settings which
|
||||||
control this mechanism.
|
control this mechanism.
|
||||||
|
|
||||||
|
[discrete]
|
||||||
===== Diagnosing `disconnected` nodes
|
===== Diagnosing `disconnected` nodes
|
||||||
|
|
||||||
Nodes typically leave the cluster with reason `disconnected` when they shut
|
Nodes typically leave the cluster with reason `disconnected` when they shut
|
||||||
|
@ -181,6 +182,7 @@ In extreme cases, you may need to take packet captures using `tcpdump` to
|
||||||
determine whether messages between nodes are being dropped or rejected by some
|
determine whether messages between nodes are being dropped or rejected by some
|
||||||
other device on the network.
|
other device on the network.
|
||||||
|
|
||||||
|
[discrete]
|
||||||
===== Diagnosing `lagging` nodes
|
===== Diagnosing `lagging` nodes
|
||||||
|
|
||||||
{es} needs every node to process cluster state updates reasonably quickly. If a
|
{es} needs every node to process cluster state updates reasonably quickly. If a
|
||||||
|
@ -225,6 +227,7 @@ To reconstruct the output, base64-decode the data and decompress it using
|
||||||
cat lagdetector.log | sed -e 's/.*://' | base64 --decode | gzip --decompress
|
cat lagdetector.log | sed -e 's/.*://' | base64 --decode | gzip --decompress
|
||||||
----
|
----
|
||||||
|
|
||||||
|
[discrete]
|
||||||
===== Diagnosing `follower check retry count exceeded` nodes
|
===== Diagnosing `follower check retry count exceeded` nodes
|
||||||
|
|
||||||
Nodes sometimes leave the cluster with reason `follower check retry count
|
Nodes sometimes leave the cluster with reason `follower check retry count
|
||||||
|
@ -260,6 +263,7 @@ By default the follower checks will time out after 30s, so if node departures
|
||||||
are unpredictable then capture stack dumps every 15s to be sure that at least
|
are unpredictable then capture stack dumps every 15s to be sure that at least
|
||||||
one stack dump was taken at the right time.
|
one stack dump was taken at the right time.
|
||||||
|
|
||||||
|
[discrete]
|
||||||
===== Diagnosing `ShardLockObtainFailedException` failures
|
===== Diagnosing `ShardLockObtainFailedException` failures
|
||||||
|
|
||||||
If a node leaves and rejoins the cluster then {es} will usually shut down and
|
If a node leaves and rejoins the cluster then {es} will usually shut down and
|
||||||
|
@ -295,3 +299,4 @@ To reconstruct the output, base64-decode the data and decompress it using
|
||||||
----
|
----
|
||||||
cat shardlock.log | sed -e 's/.*://' | base64 --decode | gzip --decompress
|
cat shardlock.log | sed -e 's/.*://' | base64 --decode | gzip --decompress
|
||||||
----
|
----
|
||||||
|
//end::troubleshooting[]
|
|
@ -48,8 +48,8 @@ fix problems that an {es} deployment might encounter.
|
||||||
|
|
||||||
[discrete]
|
[discrete]
|
||||||
[[troubleshooting-others]]
|
[[troubleshooting-others]]
|
||||||
=== Others
|
=== Other issues
|
||||||
* <<cluster-fault-detection-troubleshooting,Troubleshooting an unstable cluster>>
|
* <<troubleshooting-unstable-cluster,Troubleshooting an unstable cluster>>
|
||||||
* <<discovery-troubleshooting,Troubleshooting discovery>>
|
* <<discovery-troubleshooting,Troubleshooting discovery>>
|
||||||
* <<monitoring-troubleshooting,Troubleshooting monitoring>>
|
* <<monitoring-troubleshooting,Troubleshooting monitoring>>
|
||||||
* <<transform-troubleshooting,Troubleshooting transforms>>
|
* <<transform-troubleshooting,Troubleshooting transforms>>
|
||||||
|
@ -117,6 +117,8 @@ include::troubleshooting/snapshot/add-repository.asciidoc[]
|
||||||
|
|
||||||
include::troubleshooting/snapshot/repeated-snapshot-failures.asciidoc[]
|
include::troubleshooting/snapshot/repeated-snapshot-failures.asciidoc[]
|
||||||
|
|
||||||
|
include::troubleshooting/troubleshooting-unstable-cluster.asciidoc[]
|
||||||
|
|
||||||
include::troubleshooting/discovery-issues.asciidoc[]
|
include::troubleshooting/discovery-issues.asciidoc[]
|
||||||
|
|
||||||
include::monitoring/troubleshooting.asciidoc[]
|
include::monitoring/troubleshooting.asciidoc[]
|
||||||
|
|
|
@ -0,0 +1,4 @@
|
||||||
|
[[troubleshooting-unstable-cluster]]
|
||||||
|
== Troubleshooting an unstable cluster
|
||||||
|
|
||||||
|
include::../modules/discovery/fault-detection.asciidoc[tag=troubleshooting,leveloffset=-2]
|
Loading…
Add table
Add a link
Reference in a new issue