# Backport This will backport the following commits from `main` to `8.9`: - [[APM] Documentation updates (#160568)](https://github.com/elastic/kibana/pull/160568) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Brandon Morelli","email":"brandon.morelli@elastic.co"},"sourceCommit":{"committedDate":"2023-06-30T18:02:58Z","message":"[APM] Documentation updates (#160568)\n\n### Summary\r\n\r\nThis PR makes a handful of updates to the APM app documentation:\r\n\r\n- [x] **Alerts tab, workflow, and grouping**\r\n- Rewrote most of our [alerting\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-alerts.html#apm-alert-view-active)\r\nto explain the new granularity level of alerts, address new alert names,\r\nand explain the different ways to view active alerts.\r\n- Updated the\r\n[Services](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/services.html)\r\ndocumentation to describe the alert badge and link to alerting docs.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2887_\r\n - _Closes https://github.com/elastic/observability-docs/issues/2888_\r\n - _Closes https://github.com/elastic/observability-docs/issues/2878_\r\n- [x] **Infrastructure tab**\r\n- Added a new [top-level\r\npage](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/infrastructure.html)\r\nexplaining what this page is useful for.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2892_\r\n- [x] **Log views and correlation**\r\n- Added a new [top-level page\r\n](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/logs.html)that\r\nlinks to our log correlation docs.\r\n- Updated our [transaction\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/transactions.html#transaction-trace-sample)\r\nwith new information and a link to our log correlation docs.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2891_\r\n- [x] **New AWS Lambda metrics**\r\n- Most of the new charts have tooltips explaining what the charts do. I\r\nupdated the screenshot and added some additional information to the\r\n[overview](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-lambda.html)\r\nto highlight some of the new features of this page.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2890_\r\n- [x] **New screenshots**\r\n- I updated any screenshots I noticed were outdated while working on the\r\nabove content. Screenshot updates are not necessarily related to the\r\nchanges described above.","sha":"4ed60697e97b7120eec7d0130da28ca900ca90e9","branchLabelMapping":{"^v8.10.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v8.8.0","v8.9.0","v8.10.0"],"number":160568,"url":"https://github.com/elastic/kibana/pull/160568","mergeCommit":{"message":"[APM] Documentation updates (#160568)\n\n### Summary\r\n\r\nThis PR makes a handful of updates to the APM app documentation:\r\n\r\n- [x] **Alerts tab, workflow, and grouping**\r\n- Rewrote most of our [alerting\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-alerts.html#apm-alert-view-active)\r\nto explain the new granularity level of alerts, address new alert names,\r\nand explain the different ways to view active alerts.\r\n- Updated the\r\n[Services](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/services.html)\r\ndocumentation to describe the alert badge and link to alerting docs.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2887_\r\n - _Closes https://github.com/elastic/observability-docs/issues/2888_\r\n - _Closes https://github.com/elastic/observability-docs/issues/2878_\r\n- [x] **Infrastructure tab**\r\n- Added a new [top-level\r\npage](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/infrastructure.html)\r\nexplaining what this page is useful for.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2892_\r\n- [x] **Log views and correlation**\r\n- Added a new [top-level page\r\n](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/logs.html)that\r\nlinks to our log correlation docs.\r\n- Updated our [transaction\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/transactions.html#transaction-trace-sample)\r\nwith new information and a link to our log correlation docs.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2891_\r\n- [x] **New AWS Lambda metrics**\r\n- Most of the new charts have tooltips explaining what the charts do. I\r\nupdated the screenshot and added some additional information to the\r\n[overview](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-lambda.html)\r\nto highlight some of the new features of this page.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2890_\r\n- [x] **New screenshots**\r\n- I updated any screenshots I noticed were outdated while working on the\r\nabove content. Screenshot updates are not necessarily related to the\r\nchanges described above.","sha":"4ed60697e97b7120eec7d0130da28ca900ca90e9"}},"sourceBranch":"main","suggestedTargetBranches":["8.8","8.9"],"targetPullRequestStates":[{"branch":"8.8","label":"v8.8.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.9","label":"v8.9.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.10.0","labelRegex":"^v8.10.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/160568","number":160568,"mergeCommit":{"message":"[APM] Documentation updates (#160568)\n\n### Summary\r\n\r\nThis PR makes a handful of updates to the APM app documentation:\r\n\r\n- [x] **Alerts tab, workflow, and grouping**\r\n- Rewrote most of our [alerting\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-alerts.html#apm-alert-view-active)\r\nto explain the new granularity level of alerts, address new alert names,\r\nand explain the different ways to view active alerts.\r\n- Updated the\r\n[Services](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/services.html)\r\ndocumentation to describe the alert badge and link to alerting docs.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2887_\r\n - _Closes https://github.com/elastic/observability-docs/issues/2888_\r\n - _Closes https://github.com/elastic/observability-docs/issues/2878_\r\n- [x] **Infrastructure tab**\r\n- Added a new [top-level\r\npage](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/infrastructure.html)\r\nexplaining what this page is useful for.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2892_\r\n- [x] **Log views and correlation**\r\n- Added a new [top-level page\r\n](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/logs.html)that\r\nlinks to our log correlation docs.\r\n- Updated our [transaction\r\ndocumentation](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/transactions.html#transaction-trace-sample)\r\nwith new information and a link to our log correlation docs.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2891_\r\n- [x] **New AWS Lambda metrics**\r\n- Most of the new charts have tooltips explaining what the charts do. I\r\nupdated the screenshot and added some additional information to the\r\n[overview](https://kibana_160568.docs-preview.app.elstc.co/guide/en/kibana/master/apm-lambda.html)\r\nto highlight some of the new features of this page.\r\n - _Closes https://github.com/elastic/observability-docs/issues/2890_\r\n- [x] **New screenshots**\r\n- I updated any screenshots I noticed were outdated while working on the\r\nabove content. Screenshot updates are not necessarily related to the\r\nchanges described above.","sha":"4ed60697e97b7120eec7d0130da28ca900ca90e9"}}]}] BACKPORT--> Co-authored-by: Brandon Morelli <brandon.morelli@elastic.co>
|
@ -11,44 +11,52 @@ and trigger built-in **actions** when those conditions are met.
|
|||
|
||||
The following **rules** are supported:
|
||||
|
||||
* Latency anomaly rule:
|
||||
Alert when latency of a service is abnormal
|
||||
* Transaction error rate threshold rule:
|
||||
Alert when the service's transaction error rate is above the defined threshold
|
||||
* Error count threshold rule:
|
||||
Alert when the number of errors in a service exceeds a defined threshold
|
||||
* **Threshold rule**:
|
||||
Alert when the latency or failed transaction rate is abnormal.
|
||||
Threshold rules can be as broad or as granular as you'd like, enabling you to define exactly when you want to be alerted--whether that's at the environment level, service name level, transaction type level, and/or transaction name level.
|
||||
* **Anomaly rule**:
|
||||
Alert when either the latency of a service is anomalous. Anomaly rules can be set at the environment level, service level, and/or transaction type level.
|
||||
* **Error count rule**:
|
||||
Alert when the number of errors in a service exceeds a defined threshold. Error count rules can be set at the environment level, service level, and error group level.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-alert.png[Create an alert in the APM app]
|
||||
|
||||
Below, we'll walk through the creation of two APM rules.
|
||||
|
||||
For a complete walkthrough of the **Create rule** flyout panel, including detailed information on each configurable property,
|
||||
see Kibana's <<create-edit-rules,create and edit rules>>.
|
||||
|
||||
Below, we'll walk through the creation of two APM rules.
|
||||
|
||||
[float]
|
||||
[[apm-create-transaction-alert]]
|
||||
=== Example: create a latency anomaly rule
|
||||
|
||||
Latency anomaly rules trigger when the latency of a service is abnormal.
|
||||
Because some parts of an application are more important than others, and have a different
|
||||
tolerance for latency, we'll target a specific transaction within a service.
|
||||
|
||||
Before continuing, identify the service name, transaction type, and environment that you'd like to create a latency anomaly rule for.
|
||||
This guide will create an alert for all services based on the following criteria:
|
||||
|
||||
* Environment: production
|
||||
* Service: `{your_service.name}`
|
||||
* Transaction: `{your_transaction.name}`
|
||||
* Environment: `{your_service.environment}`
|
||||
* Severity level: critical
|
||||
* Run every five minutes
|
||||
* Send an alert to a Slack channel only when the rule status changes
|
||||
* Check every five minutes
|
||||
* Send an alert to a Slack channel when the rule status changes
|
||||
|
||||
From any page in the APM app, select **Alerts and rules** > **Latency** > **Create anomaly rule**.
|
||||
Change the name of the alert, but do not edit the tags.
|
||||
From any page in the APM app, select **Alerts and rules** > **Create anomaly rule**.
|
||||
Change the name of the rule, but do not edit the tags.
|
||||
|
||||
Based on the criteria above, define the following rule details:
|
||||
|
||||
* **Check every** - `5 minutes`
|
||||
* **Notify** - "Only on status change"
|
||||
* **Environment** - `all`
|
||||
* **Service** - `{your_service.name}`
|
||||
* **Type** - `{your_transaction.name}`
|
||||
* **Environment** - `{your_service.environment}`
|
||||
* **Has anomaly with severity** - `critical`
|
||||
* **Check every** - `5 minutes`
|
||||
|
||||
Next, add a connector. Multiple connectors can be selected, but in this example we're interested in Slack.
|
||||
Next, add a connector type. Multiple connectors can be selected, but in this example we're interested in Slack.
|
||||
Select **Slack** > **Create a connector**.
|
||||
Enter a name for the connector,
|
||||
and paste your Slack webhook URL.
|
||||
|
@ -60,30 +68,40 @@ to pass additional alert values at the time a condition is detected to an action
|
|||
A list of available variables can be accessed by selecting the
|
||||
**add variable** button image:apm/images/add-variable.png[add variable button].
|
||||
|
||||
Click **Save**. The rule has been created and is now active!
|
||||
Click **Save**. Your rule has been created and is now active!
|
||||
|
||||
[float]
|
||||
[[apm-create-error-alert]]
|
||||
=== Example: create an error count threshold alert
|
||||
|
||||
The error count threshold alert triggers when the number of errors in a service exceeds a defined threshold.
|
||||
This guide will create an alert for all services based on the following criteria:
|
||||
Because some errors are more important than others, this guide will focus a specific error group ID.
|
||||
|
||||
* All environments
|
||||
* Error rate is above 25 for the last minute
|
||||
* Check every 1 minute, and alert every time the rule is active
|
||||
Before continuing, identify the service name, environment name, and error group ID that you'd like to create a latency anomaly rule for.
|
||||
The easiest way to find an error group ID is to select the service that you're interested in and navigating to the **Errors** tab.
|
||||
|
||||
This guide will create an alert for an error group ID based on the following criteria:
|
||||
|
||||
* Service: `{your_service.name}`
|
||||
* Environment: `{your_service.environment}`
|
||||
* Error Grouping Key: `{your_error.ID}`
|
||||
* Error rate is above 25 errors for the last five minutes
|
||||
* Group alerts by `service.name` and `service.environment`
|
||||
* Check every 1 minute
|
||||
* Send the alert via email to the site reliability team
|
||||
|
||||
From any page in the APM app, select **Alerts and rules** > **Error count** > **Create threshold rule**.
|
||||
From any page in the APM app, select **Alerts and rules** > **Create error count rule**.
|
||||
Change the name of the alert, but do not edit the tags.
|
||||
|
||||
Based on the criteria above, define the following rule details:
|
||||
|
||||
* **Check every** - `1 minute`
|
||||
* **Notify** - "Every time alert is active"
|
||||
* **Environment** - `all`
|
||||
* **Service**: `{your_service.name}`
|
||||
* **Environment**: `{your_service.environment}`
|
||||
* **Error Grouping Key**: `{your_error.ID}`
|
||||
* **Is above** - `25 errors`
|
||||
* **For the last** - `1 minute`
|
||||
* **For the last** - `5 minutes`
|
||||
* **Group alerts by** - `service.name` `service.environment`
|
||||
* **Check every** - `1 minute`
|
||||
|
||||
Select the **Email** connector and click **Create a connector**.
|
||||
Fill out the required details: sender, host, port, etc., and click **save**.
|
||||
|
@ -96,6 +114,32 @@ A list of available variables can be accessed by selecting the
|
|||
|
||||
Click **Save**. The alert has been created and is now active!
|
||||
|
||||
[float]
|
||||
[[apm-alert-view-active]]
|
||||
=== View active alerts
|
||||
|
||||
Active alerts are displayed and grouped in multiple ways in the APM app.
|
||||
|
||||
[float]
|
||||
[[apm-alert-view-group]]
|
||||
==== View alerts by service group
|
||||
|
||||
If you're using the <<service-groups,service groups>> feature, you can view alerts by service group.
|
||||
From the service group overview page, click the red alert indicator to open the **Alerts** tab with a predefined filter that matches the filter used when creating the service group.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-service-group.png[Example view of service group in the APM app in Kibana]
|
||||
|
||||
[float]
|
||||
[[apm-alert-view-service]]
|
||||
==== View alerts by service
|
||||
|
||||
Alerts can be viewed within the context of any service.
|
||||
After selecting a service, go to the **Alerts** tab to view any alerts that are active for the selected service.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/active-alert-service.png[View active alerts by service]
|
||||
|
||||
[float]
|
||||
[[apm-alert-manage]]
|
||||
=== Manage alerts and rules
|
||||
|
|
|
@ -40,6 +40,8 @@ Notice something awry? Select a service or trace and dive deeper with:
|
|||
* <<spans>>
|
||||
* <<errors>>
|
||||
* <<metrics>>
|
||||
* <<infrastructure>>
|
||||
* <<logs>>
|
||||
|
||||
TIP: Want to learn more about the Elastic APM ecosystem?
|
||||
See the {apm-guide-ref}/apm-overview.html[APM Overview].
|
||||
|
@ -63,3 +65,7 @@ include::spans.asciidoc[]
|
|||
include::errors.asciidoc[]
|
||||
|
||||
include::metrics.asciidoc[]
|
||||
|
||||
include::infrastructure.asciidoc[]
|
||||
|
||||
include::logs.asciidoc[]
|
BIN
docs/apm/images/active-alert-service.png
Normal file
After Width: | Height: | Size: 392 KiB |
Before Width: | Height: | Size: 413 KiB After Width: | Height: | Size: 413 KiB |
Before Width: | Height: | Size: 362 KiB After Width: | Height: | Size: 221 KiB |
Before Width: | Height: | Size: 580 KiB After Width: | Height: | Size: 435 KiB |
Before Width: | Height: | Size: 253 KiB After Width: | Height: | Size: 200 KiB |
Before Width: | Height: | Size: 453 KiB After Width: | Height: | Size: 370 KiB |
BIN
docs/apm/images/infra.png
Normal file
After Width: | Height: | Size: 158 KiB |
Before Width: | Height: | Size: 210 KiB |
BIN
docs/apm/images/lambda-overview.png
Normal file
After Width: | Height: | Size: 1 MiB |
BIN
docs/apm/images/logs.png
Normal file
After Width: | Height: | Size: 253 KiB |
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 443 KiB |
13
docs/apm/infrastructure.asciidoc
Normal file
|
@ -0,0 +1,13 @@
|
|||
[role="xpack"]
|
||||
[[infrastructure]]
|
||||
=== Infrastructure
|
||||
|
||||
The *Infrastructure* tab provides information about the containers, pods, and hosts,
|
||||
that the selected service is linked to.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/infra.png[Example view of the Infrastructure tab in APM app in Kibana]
|
||||
|
||||
IT ops and software reliability engineers (SREs) can use this tab
|
||||
to quickly find a service's underlying infrastructure resources when debugging a problem.
|
||||
Knowing what infrastructure is related to a service allows you to remediate issues by restarting, killing hanging instances, changing configuration, rolling back deployments, scaling up, scaling out, etc.
|
|
@ -3,11 +3,15 @@
|
|||
=== Observe Lambda functions
|
||||
|
||||
Elastic APM provides performance and error monitoring for AWS Lambda functions.
|
||||
Get insight into function execution and runtime behavior, as well as visibility into how your Lambda functions relate to and depend on other services.
|
||||
See how your Lambda functions relate to and depend on other services, and
|
||||
get insight into function execution and runtime behavior, like lambda duration, cold start rate, cold start duration, compute usage, memory usage, and more.
|
||||
|
||||
To set up Lambda monitoring, see the relevant
|
||||
{apm-guide-ref}/monitoring-aws-lambda.html[quick start guide].
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/lambda-overview.png[lambda overview]
|
||||
|
||||
[float]
|
||||
[[apm-lambda-cold-start-info]]
|
||||
==== Cold starts
|
||||
|
@ -22,9 +26,6 @@ Cold starts are an unavoidable byproduct of the serverless world, but visibility
|
|||
|
||||
The cold start rate (i.e. proportion of requests that experience a cold start) is displayed per service and per transaction.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/lambda-cold-start.png[lambda cold start graph]
|
||||
|
||||
Cold start is also displayed in the trace waterfall, where you can drill-down into individual traces and see trace metadata like AWS request ID, trigger type, and trigger request ID.
|
||||
|
||||
[role="screenshot"]
|
||||
|
|
19
docs/apm/logs.asciidoc
Normal file
|
@ -0,0 +1,19 @@
|
|||
[role="xpack"]
|
||||
[[logs]]
|
||||
=== Logs
|
||||
|
||||
The *Logs* tab shows contextual logs for the selected service.
|
||||
|
||||
// tag::log-overview[]
|
||||
Logs provide detailed information about specific events, and are crucial to successfully debugging slow or erroneous transactions.
|
||||
|
||||
If you've correlated your application's logs and traces, you never have to search for relevant data; it's already available to you. Viewing log and trace data together allows you to quickly diagnose and solve problems.
|
||||
|
||||
To learn how to correlate your logs with your instrumented services,
|
||||
see {observability-guide}/application-logs.html[log correlation]
|
||||
// end::log-overview[]
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/logs.png[Example view of the Logs tab in APM app in Kibana]
|
||||
|
||||
TIP: Logs displayed on this page are filtered on `service.name`
|
|
@ -139,9 +139,6 @@ or when to remove a large dependency.
|
|||
The cold start rate chart is currently supported for <<apm-lambda-cold-start-info,AWS Lambda>>
|
||||
functions and Azure functions.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/lambda-cold-start.png[lambda cold start graph]
|
||||
|
||||
[discrete]
|
||||
[[service-instances]]
|
||||
=== Instances
|
||||
|
|
|
@ -10,6 +10,8 @@ To help surface potential issues, services are sorted by their health status:
|
|||
Health status is powered by <<machine-learning-integration,machine learning>>
|
||||
and requires anomaly detection to be enabled.
|
||||
|
||||
In addition to health status, active alerts for each service are prominently displayed in the service inventory table. Selecting an active alert badge brings you to the <<apm-alerts,Alerts>> tab where you can learn more about the active alert and take action.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-services-overview.png[Example view of services table the APM app in Kibana]
|
||||
|
||||
|
@ -17,11 +19,14 @@ image::apm/images/apm-services-overview.png[Example view of services table the A
|
|||
[[service-groups]]
|
||||
==== Service groups
|
||||
|
||||
preview::[]
|
||||
beta::[]
|
||||
|
||||
Group services together to build meaningful views that remove noise and simplify investigations across services.
|
||||
Group services together to build meaningful views that remove noise, simplify investigations across services,
|
||||
and <<apm-alert-view-group,combine related alerts>>.
|
||||
Service groups are {kib} space-specific and available for any users with appropriate access.
|
||||
|
||||
// This screenshot is reused in the alerts docs
|
||||
// Ensure it has an active alert showing
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-service-group.png[Example view of service group in the APM app in Kibana]
|
||||
|
||||
|
|
|
@ -162,12 +162,7 @@ This means you can select "Actions - View transaction in Discover" to see the ac
|
|||
|
||||
The *Logs* tab displays logs related to the sampled trace.
|
||||
|
||||
Logs provide detailed information about specific events,
|
||||
and are crucial to successfully debugging slow or erroneous transactions.
|
||||
|
||||
If you've correlated your application's logs and traces, you never have to search for relevant data;
|
||||
it's all provided on this. Viewing log and trace data together allows you to quickly diagnose
|
||||
and solve problems.
|
||||
include::./logs.asciidoc[tag=log-overview]
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-logs-tab.png[APM logs tab]
|
||||
|
|