(Doc+) KB Diag (#188167)

👋 howdy, team! 

Expanding our new outline on pulling Support Diagnostics [for
Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/master/diagnostic.html),
this adds a new "Capture Diagnostics" under Kibana > Troubleshooting.

Technically the KB Diag works [down to
v6.5.0](https://github.com/elastic/support-diagnostics/blob/main/src/main/resources/kibana-rest.yml)
but I put 7.11.0 since that's the Task Manager Health's earliest
version.

As needed, this page redirects the user back to [investigating
ES](https://www.elastic.co/guide/en/elasticsearch/reference/master/diagnostic.html)
and [fixing KB
dependencies](https://www.elastic.co/guide/en/kibana/master/access.html#not-ready)
first. And directs them forward to Support's [walkthrough investigating
the output](https://www.elastic.co/blog/troubleshooting-kibana-health).

TIA! 🙏

---------

Co-authored-by: florent-leborgne <florent.leborgne@elastic.co>
This commit is contained in:
Stef Nestor 2024-07-25 10:52:39 -06:00 committed by GitHub
parent 2295ba10e0
commit 2e78ef08f7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 160 additions and 0 deletions

View file

@ -0,0 +1,158 @@
[[kibana-diagnostic]]
=== Capturing diagnostics
++++
<titleabbrev>Capture diagnostics</titleabbrev>
++++
:keywords: Kibana diagnostic, diagnostics
The {kib} https://github.com/elastic/support-diagnostics[Support Diagnostic]
tool captures a point-in-time snapshot of {kib} and its Task Manager health.
It works on {kib} versions 7.11.0 and above.
You can use the information captured by the tool to troubleshoot problems with {kib} instances.
Check the
https://www.elastic.co/blog/troubleshooting-kibana-health[Troubleshooting
Kibana Health blog] for examples.
You can generate diagnostic information using this tool before you contact
https://support.elastic.co[Elastic Support] or
https://discuss.elastic.co[Elastic Discuss] to help get a timely answer.
[discrete]
[[kibana-diagnostic-tool-requirements]]
==== Requirements
- Java Runtime Environment or Java Development Kit v1.8 or higher.
[discrete]
[[kibana-diagnostic-tool-access]]
==== Access the tool
The Support Diagnostic tool is included out-of-the-box as a sub-library in:
* {ece} - Find the tool under **{ece}** > **Deployment** > **Operations** >
**Prepare Bundle** > **{kib}**.
* {eck} - Run the tool with https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-take-eck-dump.html[`eck-diagnostics`].
You can also get the latest version of the tool by downloading the `diagnostics-X.X.X-dist.zip` file from
https://github.com/elastic/support-diagnostics/releases/latest[the
`support-diagnostic` repo].
[discrete]
[[kibana-diagnostic-capture]]
==== Capture diagnostic information
To run a {kib} diagnostic:
. In a terminal, verify that your network and user permissions are sufficient
to connect by polling {kib}'s <<task-manager-api-health,Task Manager health>>.
+
For example, with the parameters `host:localhost`, `port:5601`, and
`username:elastic`, you'd use the following curl request. Adapt these parameters to your context.
+
[source,sh]
----
curl -X GET -k -H 'kbn-xsrf: true' -u elastic -p https://localhost:5601/api/task_manager/_health
----
// NOTCONSOLE
+
If you receive an HTTP 200 `OK` response, you can proceed to the
next step. If you receive a different response code, you must
<<kibana-diagnostic-non-200,diagnose the issue>> before proceeding.
. Using the same environment parameters, run the diagnostic tool script.
+
For information about the parameters that you can pass to the tool, refer
to the https://github.com/elastic/support-diagnostics#standard-options[diagnostic
parameter reference].
+
The following command options are recommended:
+
**Unix-based systems**
+
[source,sh]
----
sudo ./diagnostics.sh --type kibana-local --host localhost --port 5601 -u elastic -p --bypassDiagVerify --ssl --noVerify
----
+
**Windows**
+
[source,sh]
----
sudo .\diagnostics.bat --type kibana-local --host localhost --port 5601 -u elastic -p --bypassDiagVerify --ssl --noVerify
----
+
[TIP]
.Script execution modes
====
You can execute the script in three https://github.com/elastic/support-diagnostics#diagnostic-types[modes]:
* `kibana-local` (default, recommended): Polls the <<api,{kib} API>>,
gathers operating system info, and captures cluster and garbage collection (GC) logs.
* `kibana-remote`: Establishes an SSH session
to the applicable target server to pull the same information as `kibana-local`.
* `kibana-api`: Polls the <<api,{kib} API>>. All other data must be
collected manually.
====
. When the script has completed, verify that no errors were logged to
`diagnostic.log`. If the log file contains errors, refer to
<<kibana-diagnostic-log-errors,Diagnose errors in `diagnostic.log`>>.
. If the script completed without errors, an archive with the
format `<diagnostic type>-diagnostics-<DateTimeStamp>.zip` is created in
the working directory or the output directory that you have specified. You can
review or share the diagnostic archive as needed.
[discrete]
[[kibana-diagnostic-non-200]]
==== Diagnose a non-200 response
When you poll your Task Manager health, if you receive any response other
than `200 0K`, then the diagnostic tool might not work as intended. Check the possible resolutions based on the error code that you get:
HTTP 401 `UNAUTHENTICATED`::
Additional information in the error will usually indicate
that your `username:password` pair is invalid, or that your {es} `.security`
index is unavailable and you need to setup a temporary {es}
{ref}/file-realm.html[file-based realm] user with `role:superuser` to authenticate.
HTTP 403 `UNAUTHORIZED`::
Your `username` is recognized but has insufficient permissions to run the
diagnostic. Either use a different username or increase the user's privileges.
HTTP 429 `TOO_MANY_REQUESTS` (for example, `circuit_breaking_exception`)::
The authentication and authorization were successful, but the {es} cluster isn't
responding to requests. This issue is usually intermittent and can happen
when the cluster is overwhelmed. In this case, resolve {es}'s health first before
returning to {kib}.
HTTP 504 `BAD_GATEWAY`::
Your network has trouble reaching {kib}. You might be using
a proxy or firewall. Consider running the diagnostic tool from a different
location, confirming your port, or using an IP instead of a URL domain.
[discrete]
[[kibana-diagnostic-log-errors]]
==== Diagnose errors in `diagnostic.log`
The following are common errors that you might encounter when running
the diagnostic tool:
* `Error: Could not find or load main class com.elastic.support.diagnostics.DiagnosticApp`
+
This indicates that you accidentally downloaded the source code file
instead of `diagnostics-X.X.X-dist.zip` from the releases page.
* A `security_exception` that includes `is unauthorized for user`:
+
The provided user has insufficient admin permissions to run the diagnostic
tool. Use another user, or grant the user `role:superuser` privileges.
* `{kib} Server is not Ready yet`
+
This indicates issues with {kib}'s dependencies blocking full start-up.
To investigate, check <<not-ready,Troubleshoot {kib} UI error>>.

View file

@ -5,6 +5,8 @@
* <<kibana-troubleshooting-kibana-server-logs, Using {kib} server logs>>
* <<kibana-troubleshooting-trace-query, Trace {es} query to the origin in {kib}>>
* <<kibana-diagnostic, Capture {kib} diagnostic>>
include::using-server-logs.asciidoc[]
include::trace-query.asciidoc[]
include::diagnostic.asciidoc[]