[8.9] [APM] Documentation updates (#160568) (#161031)

# Backport

This will backport the following commits from `main` to `8.9`:
- [[APM] Documentation updates
(#160568)](https://github.com/elastic/kibana/pull/160568)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Brandon
Morelli","email":"brandon.morelli@elastic.co"},"sourceCommit":{"committedDate":"2023-06-30T18:02:58Z","message":"[APM]
Documentation updates (#160568)\n\n### Summary\r\n\r\nThis PR makes a
handful of updates to the APM app documentation:\r\n\r\n- [x] **Alerts
tab, workflow, and grouping**\r\n- Rewrote most of our
[alerting\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-alerts.html#apm-alert-view-active)\r\nto
explain the new granularity level of alerts, address new alert
names,\r\nand explain the different ways to view active alerts.\r\n-
Updated
the\r\n[Services](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/services.html)\r\ndocumentation
to describe the alert badge and link to alerting docs.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2887_\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2888_\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2878_\r\n- [x]
**Infrastructure tab**\r\n- Added a new
[top-level\r\npage](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/infrastructure.html)\r\nexplaining
what this page is useful for.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2892_\r\n- [x]
**Log views and correlation**\r\n- Added a new [top-level
page\r\n](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/logs.html)that\r\nlinks
to our log correlation docs.\r\n- Updated our
[transaction\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/transactions.html#transaction-trace-sample)\r\nwith
new information and a link to our log correlation docs.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2891_\r\n- [x]
**New AWS Lambda metrics**\r\n- Most of the new charts have tooltips
explaining what the charts do. I\r\nupdated the screenshot and added
some additional information to
the\r\n[overview](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-lambda.html)\r\nto
highlight some of the new features of this page.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2890_\r\n- [x]
**New screenshots**\r\n- I updated any screenshots I noticed were
outdated while working on the\r\nabove content. Screenshot updates are
not necessarily related to the\r\nchanges described
above.","sha":"4ed60697e97b7120eec7d0130da28ca900ca90e9","branchLabelMapping":{"^v8.10.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v8.8.0","v8.9.0","v8.10.0"],"number":160568,"url":"https://github.com/elastic/kibana/pull/160568","mergeCommit":{"message":"[APM]
Documentation updates (#160568)\n\n### Summary\r\n\r\nThis PR makes a
handful of updates to the APM app documentation:\r\n\r\n- [x] **Alerts
tab, workflow, and grouping**\r\n- Rewrote most of our
[alerting\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-alerts.html#apm-alert-view-active)\r\nto
explain the new granularity level of alerts, address new alert
names,\r\nand explain the different ways to view active alerts.\r\n-
Updated
the\r\n[Services](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/services.html)\r\ndocumentation
to describe the alert badge and link to alerting docs.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2887_\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2888_\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2878_\r\n- [x]
**Infrastructure tab**\r\n- Added a new
[top-level\r\npage](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/infrastructure.html)\r\nexplaining
what this page is useful for.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2892_\r\n- [x]
**Log views and correlation**\r\n- Added a new [top-level
page\r\n](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/logs.html)that\r\nlinks
to our log correlation docs.\r\n- Updated our
[transaction\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/transactions.html#transaction-trace-sample)\r\nwith
new information and a link to our log correlation docs.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2891_\r\n- [x]
**New AWS Lambda metrics**\r\n- Most of the new charts have tooltips
explaining what the charts do. I\r\nupdated the screenshot and added
some additional information to
the\r\n[overview](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-lambda.html)\r\nto
highlight some of the new features of this page.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2890_\r\n- [x]
**New screenshots**\r\n- I updated any screenshots I noticed were
outdated while working on the\r\nabove content. Screenshot updates are
not necessarily related to the\r\nchanges described
above.","sha":"4ed60697e97b7120eec7d0130da28ca900ca90e9"}},"sourceBranch":"main","suggestedTargetBranches":["8.8","8.9"],"targetPullRequestStates":[{"branch":"8.8","label":"v8.8.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.9","label":"v8.9.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.10.0","labelRegex":"^v8.10.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/160568","number":160568,"mergeCommit":{"message":"[APM]
Documentation updates (#160568)\n\n### Summary\r\n\r\nThis PR makes a
handful of updates to the APM app documentation:\r\n\r\n- [x] **Alerts
tab, workflow, and grouping**\r\n- Rewrote most of our
[alerting\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-alerts.html#apm-alert-view-active)\r\nto
explain the new granularity level of alerts, address new alert
names,\r\nand explain the different ways to view active alerts.\r\n-
Updated
the\r\n[Services](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/services.html)\r\ndocumentation
to describe the alert badge and link to alerting docs.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2887_\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2888_\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2878_\r\n- [x]
**Infrastructure tab**\r\n- Added a new
[top-level\r\npage](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/infrastructure.html)\r\nexplaining
what this page is useful for.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2892_\r\n- [x]
**Log views and correlation**\r\n- Added a new [top-level
page\r\n](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/logs.html)that\r\nlinks
to our log correlation docs.\r\n- Updated our
[transaction\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/transactions.html#transaction-trace-sample)\r\nwith
new information and a link to our log correlation docs.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2891_\r\n- [x]
**New AWS Lambda metrics**\r\n- Most of the new charts have tooltips
explaining what the charts do. I\r\nupdated the screenshot and added
some additional information to
the\r\n[overview](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-lambda.html)\r\nto
highlight some of the new features of this page.\r\n - _Closes
https://github.com/elastic/observability-docs/issues/2890_\r\n- [x]
**New screenshots**\r\n- I updated any screenshots I noticed were
outdated while working on the\r\nabove content. Screenshot updates are
not necessarily related to the\r\nchanges described
above.","sha":"4ed60697e97b7120eec7d0130da28ca900ca90e9"}}]}]
BACKPORT-->

Co-authored-by: Brandon Morelli <brandon.morelli@elastic.co>
This commit is contained in:
Kibana Machine 2023-06-30 14:28:05 -04:00 committed by GitHub
parent fd65b756d9
commit 63faa6723e
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
19 changed files with 122 additions and 42 deletions

View file

@ -11,44 +11,52 @@ and trigger built-in **actions** when those conditions are met.
The following **rules** are supported:
* Latency anomaly rule:
Alert when latency of a service is abnormal
* Transaction error rate threshold rule:
Alert when the service's transaction error rate is above the defined threshold
* Error count threshold rule:
Alert when the number of errors in a service exceeds a defined threshold
* **Threshold rule**:
Alert when the latency or failed transaction rate is abnormal.
Threshold rules can be as broad or as granular as you'd like, enabling you to define exactly when you want to be alerted--whether that's at the environment level, service name level, transaction type level, and/or transaction name level.
* **Anomaly rule**:
Alert when either the latency of a service is anomalous. Anomaly rules can be set at the environment level, service level, and/or transaction type level.
* **Error count rule**:
Alert when the number of errors in a service exceeds a defined threshold. Error count rules can be set at the environment level, service level, and error group level.
[role="screenshot"]
image::apm/images/apm-alert.png[Create an alert in the APM app]
Below, we'll walk through the creation of two APM rules.
For a complete walkthrough of the **Create rule** flyout panel, including detailed information on each configurable property,
see Kibana's <<create-edit-rules,create and edit rules>>.
Below, we'll walk through the creation of two APM rules.
[float]
[[apm-create-transaction-alert]]
=== Example: create a latency anomaly rule
Latency anomaly rules trigger when the latency of a service is abnormal.
Because some parts of an application are more important than others, and have a different
tolerance for latency, we'll target a specific transaction within a service.
Before continuing, identify the service name, transaction type, and environment that you'd like to create a latency anomaly rule for.
This guide will create an alert for all services based on the following criteria:
* Environment: production
* Service: `{your_service.name}`
* Transaction: `{your_transaction.name}`
* Environment: `{your_service.environment}`
* Severity level: critical
* Run every five minutes
* Send an alert to a Slack channel only when the rule status changes
* Check every five minutes
* Send an alert to a Slack channel when the rule status changes
From any page in the APM app, select **Alerts and rules** > **Latency** > **Create anomaly rule**.
Change the name of the alert, but do not edit the tags.
From any page in the APM app, select **Alerts and rules** > **Create anomaly rule**.
Change the name of the rule, but do not edit the tags.
Based on the criteria above, define the following rule details:
* **Check every** - `5 minutes`
* **Notify** - "Only on status change"
* **Environment** - `all`
* **Service** - `{your_service.name}`
* **Type** - `{your_transaction.name}`
* **Environment** - `{your_service.environment}`
* **Has anomaly with severity** - `critical`
* **Check every** - `5 minutes`
Next, add a connector. Multiple connectors can be selected, but in this example we're interested in Slack.
Next, add a connector type. Multiple connectors can be selected, but in this example we're interested in Slack.
Select **Slack** > **Create a connector**.
Enter a name for the connector,
and paste your Slack webhook URL.
@ -60,30 +68,40 @@ to pass additional alert values at the time a condition is detected to an action
A list of available variables can be accessed by selecting the
**add variable** button image:apm/images/add-variable.png[add variable button].
Click **Save**. The rule has been created and is now active!
Click **Save**. Your rule has been created and is now active!
[float]
[[apm-create-error-alert]]
=== Example: create an error count threshold alert
The error count threshold alert triggers when the number of errors in a service exceeds a defined threshold.
This guide will create an alert for all services based on the following criteria:
Because some errors are more important than others, this guide will focus a specific error group ID.
* All environments
* Error rate is above 25 for the last minute
* Check every 1 minute, and alert every time the rule is active
Before continuing, identify the service name, environment name, and error group ID that you'd like to create a latency anomaly rule for.
The easiest way to find an error group ID is to select the service that you're interested in and navigating to the **Errors** tab.
This guide will create an alert for an error group ID based on the following criteria:
* Service: `{your_service.name}`
* Environment: `{your_service.environment}`
* Error Grouping Key: `{your_error.ID}`
* Error rate is above 25 errors for the last five minutes
* Group alerts by `service.name` and `service.environment`
* Check every 1 minute
* Send the alert via email to the site reliability team
From any page in the APM app, select **Alerts and rules** > **Error count** > **Create threshold rule**.
From any page in the APM app, select **Alerts and rules** > **Create error count rule**.
Change the name of the alert, but do not edit the tags.
Based on the criteria above, define the following rule details:
* **Check every** - `1 minute`
* **Notify** - "Every time alert is active"
* **Environment** - `all`
* **Service**: `{your_service.name}`
* **Environment**: `{your_service.environment}`
* **Error Grouping Key**: `{your_error.ID}`
* **Is above** - `25 errors`
* **For the last** - `1 minute`
* **For the last** - `5 minutes`
* **Group alerts by** - `service.name` `service.environment`
* **Check every** - `1 minute`
Select the **Email** connector and click **Create a connector**.
Fill out the required details: sender, host, port, etc., and click **save**.
@ -96,6 +114,32 @@ A list of available variables can be accessed by selecting the
Click **Save**. The alert has been created and is now active!
[float]
[[apm-alert-view-active]]
=== View active alerts
Active alerts are displayed and grouped in multiple ways in the APM app.
[float]
[[apm-alert-view-group]]
==== View alerts by service group
If you're using the <<service-groups,service groups>> feature, you can view alerts by service group.
From the service group overview page, click the red alert indicator to open the **Alerts** tab with a predefined filter that matches the filter used when creating the service group.
[role="screenshot"]
image::apm/images/apm-service-group.png[Example view of service group in the APM app in Kibana]
[float]
[[apm-alert-view-service]]
==== View alerts by service
Alerts can be viewed within the context of any service.
After selecting a service, go to the **Alerts** tab to view any alerts that are active for the selected service.
[role="screenshot"]
image::apm/images/active-alert-service.png[View active alerts by service]
[float]
[[apm-alert-manage]]
=== Manage alerts and rules

View file

@ -40,6 +40,8 @@ Notice something awry? Select a service or trace and dive deeper with:
* <<spans>>
* <<errors>>
* <<metrics>>
* <<infrastructure>>
* <<logs>>
TIP: Want to learn more about the Elastic APM ecosystem?
See the {apm-guide-ref}/apm-overview.html[APM Overview].
@ -63,3 +65,7 @@ include::spans.asciidoc[]
include::errors.asciidoc[]
include::metrics.asciidoc[]
include::infrastructure.asciidoc[]
include::logs.asciidoc[]

Binary file not shown.

After

Width:  |  Height:  |  Size: 392 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 413 KiB

After

Width:  |  Height:  |  Size: 413 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 362 KiB

After

Width:  |  Height:  |  Size: 221 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 580 KiB

After

Width:  |  Height:  |  Size: 435 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 253 KiB

After

Width:  |  Height:  |  Size: 200 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 453 KiB

After

Width:  |  Height:  |  Size: 370 KiB

Before After
Before After

BIN
docs/apm/images/infra.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 158 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1 MiB

BIN
docs/apm/images/logs.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 253 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 MiB

After

Width:  |  Height:  |  Size: 443 KiB

Before After
Before After

View file

@ -0,0 +1,13 @@
[role="xpack"]
[[infrastructure]]
=== Infrastructure
The *Infrastructure* tab provides information about the containers, pods, and hosts,
that the selected service is linked to.
[role="screenshot"]
image::apm/images/infra.png[Example view of the Infrastructure tab in APM app in Kibana]
IT ops and software reliability engineers (SREs) can use this tab
to quickly find a service's underlying infrastructure resources when debugging a problem.
Knowing what infrastructure is related to a service allows you to remediate issues by restarting, killing hanging instances, changing configuration, rolling back deployments, scaling up, scaling out, etc.

View file

@ -3,11 +3,15 @@
=== Observe Lambda functions
Elastic APM provides performance and error monitoring for AWS Lambda functions.
Get insight into function execution and runtime behavior, as well as visibility into how your Lambda functions relate to and depend on other services.
See how your Lambda functions relate to and depend on other services, and
get insight into function execution and runtime behavior, like lambda duration, cold start rate, cold start duration, compute usage, memory usage, and more.
To set up Lambda monitoring, see the relevant
{apm-guide-ref}/monitoring-aws-lambda.html[quick start guide].
[role="screenshot"]
image::apm/images/lambda-overview.png[lambda overview]
[float]
[[apm-lambda-cold-start-info]]
==== Cold starts
@ -22,9 +26,6 @@ Cold starts are an unavoidable byproduct of the serverless world, but visibility
The cold start rate (i.e. proportion of requests that experience a cold start) is displayed per service and per transaction.
[role="screenshot"]
image::apm/images/lambda-cold-start.png[lambda cold start graph]
Cold start is also displayed in the trace waterfall, where you can drill-down into individual traces and see trace metadata like AWS request ID, trigger type, and trigger request ID.
[role="screenshot"]

19
docs/apm/logs.asciidoc Normal file
View file

@ -0,0 +1,19 @@
[role="xpack"]
[[logs]]
=== Logs
The *Logs* tab shows contextual logs for the selected service.
// tag::log-overview[]
Logs provide detailed information about specific events, and are crucial to successfully debugging slow or erroneous transactions.
If you've correlated your application's logs and traces, you never have to search for relevant data; it's already available to you. Viewing log and trace data together allows you to quickly diagnose and solve problems.
To learn how to correlate your logs with your instrumented services,
see {observability-guide}/application-logs.html[log correlation]
// end::log-overview[]
[role="screenshot"]
image::apm/images/logs.png[Example view of the Logs tab in APM app in Kibana]
TIP: Logs displayed on this page are filtered on `service.name`

View file

@ -139,9 +139,6 @@ or when to remove a large dependency.
The cold start rate chart is currently supported for <<apm-lambda-cold-start-info,AWS Lambda>>
functions and Azure functions.
[role="screenshot"]
image::apm/images/lambda-cold-start.png[lambda cold start graph]
[discrete]
[[service-instances]]
=== Instances

View file

@ -10,6 +10,8 @@ To help surface potential issues, services are sorted by their health status:
Health status is powered by <<machine-learning-integration,machine learning>>
and requires anomaly detection to be enabled.
In addition to health status, active alerts for each service are prominently displayed in the service inventory table. Selecting an active alert badge brings you to the <<apm-alerts,Alerts>> tab where you can learn more about the active alert and take action.
[role="screenshot"]
image::apm/images/apm-services-overview.png[Example view of services table the APM app in Kibana]
@ -17,11 +19,14 @@ image::apm/images/apm-services-overview.png[Example view of services table the A
[[service-groups]]
==== Service groups
preview::[]
beta::[]
Group services together to build meaningful views that remove noise and simplify investigations across services.
Group services together to build meaningful views that remove noise, simplify investigations across services,
and <<apm-alert-view-group,combine related alerts>>.
Service groups are {kib} space-specific and available for any users with appropriate access.
// This screenshot is reused in the alerts docs
// Ensure it has an active alert showing
[role="screenshot"]
image::apm/images/apm-service-group.png[Example view of service group in the APM app in Kibana]

View file

@ -162,12 +162,7 @@ This means you can select "Actions - View transaction in Discover" to see the ac
The *Logs* tab displays logs related to the sampled trace.
Logs provide detailed information about specific events,
and are crucial to successfully debugging slow or erroneous transactions.
If you've correlated your application's logs and traces, you never have to search for relevant data;
it's all provided on this. Viewing log and trace data together allows you to quickly diagnose
and solve problems.
include::./logs.asciidoc[tag=log-overview]
[role="screenshot"]
image::apm/images/apm-logs-tab.png[APM logs tab]