Commit graph

22 commits

Author SHA1 Message Date
Niels Bauman
64891011d3
Extend repository_integrity health indicator for unknown and invalid repos (#104614)
This PR extends the repository integrity health indicator to cover also unknown and invalid repositories. Because these errors are local to a node, we extend the `LocalHealthMonitor` to monitor the repositories and report the changes in their health regarding the unknown or invalid status.
To simplify this extension in the future, we introduce the `HealthTracker` abstract class that can be used to create new local health checks.
Furthermore, we change the severity of the health status when the repository integrity indicator reports unhealthy from `RED` to `YELLOW` because even though this is a serious issue, there is no user impact yet.
2024-02-07 15:18:55 +01:00
Dianna Hohensee
a25e176692
Add node "roles" to allocation explain response (#98550)
Report node "roles" in the /_cluster/allocation/explain response.
Nodes with limited sets of roles may affect shard distribution in ways
users did not originally consider, so it is helpful to surface this
information along with node allocation decision explanations.
2023-08-23 08:30:35 -04:00
Pablo Alcantar Morales
253fe6325d
Add shards capacity troubleshooting guide (#95208) 2023-04-19 09:24:07 +02:00
Andrei Dan
9170113036
[HealthAPI] Add support for the FEATURE_STATE affected resource (#92296)
The `shards_availability` indicator diagnoses the condition where
indices need to be restored from snapshot.
Starting with 8.0 using feature_states when restoring from snapshot is
mandatory.

This adds support for the `FEATURE_STATE` affected resource to aid with
building up the snapshot restore API call (which will need to include
all the indices and feature states reported by the restore-from-snapshot
diagnosis).

Note that the health API will not report any indices that are part of a
feature state.
2022-12-20 13:39:41 +00:00
Mary Gouseti
cfd23d512f
Disk indicator troubleshooting guides (#90504) 2022-10-14 15:24:21 +02:00
James Baiera
db73aa0498
Add repeated snapshot failure troubleshooting guide (#89762)
This troubleshooting guide is what will be returned from the SLM health indicator 
when a SLM policy has suffered from too many repeat failures without a successful 
execution.
2022-09-15 17:01:32 -04:00
Iraklis Psaroudakis
d83ed3315a
Re-registering corrupt repository unblocks it (#89719)
Fixes #89130
2022-09-12 20:21:35 +03:00
Mary Gouseti
89903bbe23
Troubleshooting docs for ACTION_RESTORE_FROM_SNAPSHOT (#87692)
Troubleshooting guide to restore indices and data streams that have
missing data from a snapshot.

This will be associated with the user action
`ACTION_RESTORE_FROM_SNAPSHOT`.

Preview link:
https://elasticsearch_87692.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/restore-from-snapshot.html
2022-07-27 23:37:08 +09:30
Mary Gouseti
0f670404f6
Fix Note in troubleshooting docs (#88846) 2022-07-27 14:31:06 +02:00
Andrei Dan
f3431e1bff
Add troubleshooting guide for corrupt repository (#88391)
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
2022-07-14 13:37:02 +01:00
Andrei Dan
3e1242b63e
Replace ilm/slm with their full names (#88060) 2022-06-30 09:44:46 +01:00
Andrei Dan
4e869860d6
Fix StableMasterHealthIndicatorServiceTests and start-slm doc test (#87962) 2022-06-23 12:48:25 +01:00
Andrei Dan
6e98072db5
Add start slm user action (#87854)
This creates a user action for the slm health indicator that will help
the user to start SLM.
2022-06-23 11:04:45 +01:00
Andrei Dan
a4e7064b0e
Create ILM not running user action (#87852)
This creates a user action for the ilm health indicator that will help
the user to start ILM.
2022-06-23 09:54:31 +01:00
Andrei Dan
2ec4a9e006
Add troubleshooting doc for missing tier (#87526)
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
2022-06-17 12:24:33 +01:00
Keith Massey
6caf39c109
How to increase node capacity docs (#87188)
This adds troubleshooting documentation for the case when the ShardsAvailabilityHealthIndicatorService
reports that there are not enough nodes in the data tier (user action "increase_node_capacity_for_allocations" or
"increase_tier_capacity_for_allocations_". This covers both the cloud and self-managed environments. For
cloud we first recommend increasing the number of availability zones (because you cannot directly add nodes), and
decreasing index.number_of_replicas if that is not possible. For self-managed, we first recommend adding nodes,
and decreasing index.number_of_replicas if that is not possible.
2022-06-08 14:06:47 -05:00
Andrei Dan
15b8fc3151
Fix troubleshooting doc tab to point to self-managed (#87465)
This fixes the Self Managed tab to load the
self-managed instructions.
2022-06-07 16:29:54 +01:00
Andrei Dan
3fb60a1551
Update get kibana guide in troubleshooting docs (#87075) 2022-05-25 16:11:18 +01:00
Andrei Dan
08b323131f
Troubleshooting guides for disabled allocations (#86789)
This adds the troubleshooing guides when index and cluster allocations are
disabled.

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
2022-05-24 10:27:15 +01:00
Andrei Dan
20802a9f66
Add migrate to tiers troubleshooting doc (#86738)
This adds a troubleshooting doc for indices that mix index filtering allocation
with data tiers routing.

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
2022-05-24 10:12:28 +01:00
Andrei Dan
490f417efd
Troubleshooting guide for diagnosing unassigned shards (#86996)
Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
2022-05-24 09:56:23 +01:00
Andrei Dan
21785c9a77
How-to docs for increasing the total number of shards per node (#86214)
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
2022-05-10 09:13:27 +01:00