Co-authored-by: Nathan L Smith <nathan.smith@elastic.co> Co-authored-by: Nathan L Smith <nathan.smith@elastic.co>
|
@ -43,6 +43,7 @@ Supported configurations are also tagged with the image:./images/dynamic-config.
|
|||
|
||||
[horizontal]
|
||||
Go Agent:: {apm-go-ref}/configuration.html[Configuration reference]
|
||||
iOS agent:: _Not yet supported_
|
||||
Java Agent:: {apm-java-ref}/configuration.html[Configuration reference]
|
||||
.NET Agent:: {apm-dotnet-ref}/configuration.html[Configuration reference]
|
||||
Node.js Agent:: {apm-node-ref}/configuration.html[Configuration reference]
|
||||
|
|
|
@ -1,69 +1,57 @@
|
|||
[role="xpack"]
|
||||
[[apm-alerts]]
|
||||
=== Alerts
|
||||
=== Alerts and rules
|
||||
|
||||
++++
|
||||
<titleabbrev>Create an alert</titleabbrev>
|
||||
++++
|
||||
|
||||
The APM app allows you to define **rules** to detect complex conditions within your APM data
|
||||
and trigger built-in **actions** when those conditions are met.
|
||||
|
||||
The APM app integrates with Kibana's {kibana-ref}/alerting-getting-started.html[alerting and actions] feature.
|
||||
It provides a set of built-in **actions** and APM specific threshold **alerts** for you to use
|
||||
and enables central management of all alerts from <<management,Kibana Management>>.
|
||||
The following **rules** are supported:
|
||||
|
||||
* Latency anomaly rule:
|
||||
Alert when latency of a service is abnormal
|
||||
* Transaction error rate threshold rule:
|
||||
Alert when the service's transaction error rate is above the defined threshold
|
||||
* Error count threshold rule:
|
||||
Alert when the number of errors in a service exceeds a defined threshold
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-alert.png[Create an alert in the APM app]
|
||||
|
||||
For a walkthrough of the alert flyout panel, including detailed information on each configurable property,
|
||||
see Kibana's <<create-edit-rules,defining alerts>>.
|
||||
For a complete walkthrough of the **Create rule** flyout panel, including detailed information on each configurable property,
|
||||
see Kibana's <<create-edit-rules,create and edit rules>>.
|
||||
|
||||
The APM app supports four different types of alerts:
|
||||
|
||||
* Transaction duration anomaly:
|
||||
alerts when the service's transaction duration reaches a certain anomaly score
|
||||
* Transaction duration threshold:
|
||||
alerts when the service's transaction duration exceeds a given time limit over a given time frame
|
||||
* Transaction error rate threshold:
|
||||
alerts when the service's transaction error rate is above the selected rate over a given time frame
|
||||
* Error count threshold:
|
||||
alerts when service exceeds a selected number of errors over a given time frame
|
||||
|
||||
Below, we'll walk through the creation of two of these alerts.
|
||||
Below, we'll walk through the creation of two APM rules.
|
||||
|
||||
[float]
|
||||
[[apm-create-transaction-alert]]
|
||||
=== Example: create a transaction duration alert
|
||||
=== Example: create a latency anomaly rule
|
||||
|
||||
Transaction duration alerts trigger when the duration of a specific transaction type in a service exceeds a defined threshold.
|
||||
This guide will create an alert for the `opbeans-java` service based on the following criteria:
|
||||
Latency anomaly rules trigger when the latency of a service is abnormal.
|
||||
This guide will create an alert for all services based on the following criteria:
|
||||
|
||||
* Environment: Production
|
||||
* Transaction type: `transaction.type:request`
|
||||
* Average request is above `1500ms` for the last 5 minutes
|
||||
* Check every 10 minutes, and repeat the alert every 30 minutes
|
||||
* Send the alert via Slack
|
||||
* Environment: production
|
||||
* Severity level: critical
|
||||
* Run every five minutes
|
||||
* Send an alert to a Slack channel only when the rule status changes
|
||||
|
||||
From the APM app, navigate to the `opbeans-java` service and select
|
||||
**Alerts** > **Create threshold alert** > **Transaction duration**.
|
||||
From any page in the APM app, select **Alerts and rules** > **Latency** > **Create anomaly rule**.
|
||||
Change the name of the alert, but do not edit the tags.
|
||||
|
||||
`Transaction duration | opbeans-java` is automatically set as the name of the alert,
|
||||
and `apm` and `service.name:opbeans-java` are added as tags.
|
||||
It's fine to change the name of the alert, but do not edit the tags.
|
||||
Based on the criteria above, define the following rule details:
|
||||
|
||||
Based on the alert criteria, define the following alert details:
|
||||
* **Check every** - `5 minutes`
|
||||
* **Notify** - "Only on status change"
|
||||
* **Environment** - `all`
|
||||
* **Has anomaly with severity** - `critical`
|
||||
|
||||
* **Check every** - `10 minutes`
|
||||
* **Notify every** - `30 minutes`
|
||||
* **TYPE** - `request`
|
||||
* **WHEN** - `avg`
|
||||
* **IS ABOVE** - `1500ms`
|
||||
* **FOR THE LAST** - `5 minutes`
|
||||
|
||||
Select an action type.
|
||||
Multiple action types can be selected, but in this example, we want to post to a Slack channel.
|
||||
Next, add a connector. Multiple connectors can be selected, but in this example we're interested in Slack.
|
||||
Select **Slack** > **Create a connector**.
|
||||
Enter a name for the connector,
|
||||
and paste the webhook URL.
|
||||
and paste your Slack webhook URL.
|
||||
See Slack's webhook documentation if you need to create one.
|
||||
|
||||
A default message is provided as a starting point for your alert.
|
||||
|
@ -72,35 +60,32 @@ to pass additional alert values at the time a condition is detected to an action
|
|||
A list of available variables can be accessed by selecting the
|
||||
**add variable** button image:apm/images/add-variable.png[add variable button].
|
||||
|
||||
Select **Save**. The alert has been created and is now active!
|
||||
Click **Save**. The rule has been created and is now active!
|
||||
|
||||
[float]
|
||||
[[apm-create-error-alert]]
|
||||
=== Example: create an error rate alert
|
||||
=== Example: create an error count threshold alert
|
||||
|
||||
Error rate alerts trigger when the number of errors in a service exceeds a defined threshold.
|
||||
This guide creates an alert for the `opbeans-python` service based on the following criteria:
|
||||
The error count threshold alert triggers when the number of errors in a service exceeds a defined threshold.
|
||||
This guide will create an alert for all services based on the following criteria:
|
||||
|
||||
* Environment: Production
|
||||
* All environments
|
||||
* Error rate is above 25 for the last minute
|
||||
* Check every 1 minute, and repeat the alert every 10 minutes
|
||||
* Send the alert via email to the `opbeans-python` team
|
||||
* Check every 1 minute, and alert every time the rule is active
|
||||
* Send the alert via email to the site reliability team
|
||||
|
||||
From the APM app, navigate to the `opbeans-python` service and select
|
||||
**Alerts** > **Create threshold alert** > **Error rate**.
|
||||
From any page in the APM app, select **Alerts and rules** > **Error count** > **Create threshold rule**.
|
||||
Change the name of the alert, but do not edit the tags.
|
||||
|
||||
`Error rate | opbeans-python` is automatically set as the name of the alert,
|
||||
and `apm` and `service.name:opbeans-python` are added as tags.
|
||||
It's fine to change the name of the alert, but do not edit the tags.
|
||||
|
||||
Based on the alert criteria, define the following alert details:
|
||||
Based on the criteria above, define the following rule details:
|
||||
|
||||
* **Check every** - `1 minute`
|
||||
* **Notify every** - `10 minutes`
|
||||
* **IS ABOVE** - `25 errors`
|
||||
* **FOR THE LAST** - `1 minute`
|
||||
* **Notify** - "Every time alert is active"
|
||||
* **Environment** - `all`
|
||||
* **Is above** - `25 errors`
|
||||
* **For the last** - `1 minute`
|
||||
|
||||
Select the **Email** action type and click **Create a connector**.
|
||||
Select the **Email** connector and click **Create a connector**.
|
||||
Fill out the required details: sender, host, port, etc., and click **save**.
|
||||
|
||||
A default message is provided as a starting point for your alert.
|
||||
|
@ -109,14 +94,14 @@ to pass additional alert values at the time a condition is detected to an action
|
|||
A list of available variables can be accessed by selecting the
|
||||
**add variable** button image:apm/images/add-variable.png[add variable button].
|
||||
|
||||
Select **Save**. The alert has been created and is now active!
|
||||
Click **Save**. The alert has been created and is now active!
|
||||
|
||||
[float]
|
||||
[[apm-alert-manage]]
|
||||
=== Manage alerts and actions
|
||||
=== Manage alerts and rules
|
||||
|
||||
From the APM app, select **Alerts** > **View active alerts** to be taken to the Kibana alerts and actions management page.
|
||||
From this page, you can create, edit, disable, mute, and delete alerts, and create, edit, and disable connectors.
|
||||
From the APM app, select **Alerts and rules** > **Manage rules** to be taken to the Kibana **Rules and Connectors** page.
|
||||
From this page, you can disable, mute, and delete APM alerts.
|
||||
|
||||
[float]
|
||||
[[apm-alert-more-info]]
|
||||
|
@ -126,4 +111,4 @@ See {kibana-ref}/alerting-getting-started.html[alerting and actions] for more in
|
|||
|
||||
NOTE: If you are using an **on-premise** Elastic Stack deployment with security,
|
||||
communication between Elasticsearch and Kibana must have TLS configured.
|
||||
More information is in the alerting {kibana-ref}/alerting-setup.html#alerting-prerequisites[prerequisites].
|
||||
More information is in the alerting {kibana-ref}/alerting-setup.html#alerting-prerequisites[prerequisites].
|
||||
|
|
|
@ -36,6 +36,7 @@ It's vital to be consistent when naming environments in your agents.
|
|||
To learn how to configure service environments, see the specific agent documentation:
|
||||
|
||||
* *Go:* {apm-go-ref}/configuration.html#config-environment[`ELASTIC_APM_ENVIRONMENT`]
|
||||
* *iOS agent:* _Not yet supported_
|
||||
* *Java:* {apm-java-ref}/config-core.html#config-environment[`environment`]
|
||||
* *.NET:* {apm-dotnet-ref}/config-core.html#config-environment[`Environment`]
|
||||
* *Node.js:* {apm-node-ref}/configuration.html#environment[`environment`]
|
||||
|
|
Before Width: | Height: | Size: 268 KiB After Width: | Height: | Size: 257 KiB |
Before Width: | Height: | Size: 575 KiB After Width: | Height: | Size: 413 KiB |
Before Width: | Height: | Size: 301 KiB After Width: | Height: | Size: 327 KiB |
Before Width: | Height: | Size: 429 KiB After Width: | Height: | Size: 545 KiB |
Before Width: | Height: | Size: 401 KiB After Width: | Height: | Size: 281 KiB |
Before Width: | Height: | Size: 202 KiB After Width: | Height: | Size: 222 KiB |
Before Width: | Height: | Size: 168 KiB After Width: | Height: | Size: 191 KiB |
Before Width: | Height: | Size: 187 KiB After Width: | Height: | Size: 253 KiB |
Before Width: | Height: | Size: 59 KiB After Width: | Height: | Size: 60 KiB |
Before Width: | Height: | Size: 725 KiB After Width: | Height: | Size: 460 KiB |
Before Width: | Height: | Size: 250 KiB After Width: | Height: | Size: 307 KiB |
Before Width: | Height: | Size: 564 KiB After Width: | Height: | Size: 531 KiB |
Before Width: | Height: | Size: 558 KiB After Width: | Height: | Size: 407 KiB |
Before Width: | Height: | Size: 475 KiB After Width: | Height: | Size: 307 KiB |
|
@ -108,6 +108,7 @@ Service maps are supported for the following Agent versions:
|
|||
|
||||
[horizontal]
|
||||
Go agent:: ≥ v1.7.0
|
||||
iOS agent:: _Not yet supported_
|
||||
Java agent:: ≥ v1.13.0
|
||||
.NET agent:: ≥ v1.3.0
|
||||
Node.js agent:: ≥ v3.6.0
|
||||
|
|
|
@ -100,22 +100,22 @@ the selected transaction group.
|
|||
image::apm/images/apm-transaction-response-dist.png[Example view of response time distribution]
|
||||
|
||||
[[transaction-duration-distribution]]
|
||||
==== Transactions duration distribution
|
||||
==== Latency distribution
|
||||
|
||||
This chart plots all transaction durations for the given time period.
|
||||
A plot of all transaction durations for the given time period.
|
||||
The screenshot below shows a typical distribution,
|
||||
and indicates most of our requests were served quickly -- awesome!
|
||||
It's the requests on the right, the ones taking longer than average, that we probably want to focus on.
|
||||
It's the requests on the right, the ones taking longer than average, that we probably need to focus on.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-transaction-duration-dist.png[Example view of transactions duration distribution graph]
|
||||
image::apm/images/apm-transaction-duration-dist.png[Example view of latency distribution graph]
|
||||
|
||||
Select a transaction duration _bucket_ to display up to ten trace samples.
|
||||
Select a latency duration _bucket_ to display up to ten trace samples.
|
||||
|
||||
[[transaction-trace-sample]]
|
||||
==== Trace sample
|
||||
|
||||
Trace samples are based on the _bucket_ selection in the *Transactions duration distribution* chart;
|
||||
Trace samples are based on the _bucket_ selection in the *Latency distribution* chart;
|
||||
update the samples by selecting a new _bucket_.
|
||||
The number of requests per bucket is displayed when hovering over the graph,
|
||||
and the selected bucket is highlighted to stand out.
|
||||
|
|
|
@ -15,6 +15,7 @@ don't forget to check our other troubleshooting guides or discussion forum:
|
|||
* {apm-server-ref}/troubleshooting.html[APM Server troubleshooting]
|
||||
* {apm-dotnet-ref}/troubleshooting.html[.NET agent troubleshooting]
|
||||
* {apm-go-ref}/troubleshooting.html[Go agent troubleshooting]
|
||||
* {apm-ios-ref}/troubleshooting.html[iOS agent troubleshooting]
|
||||
* {apm-java-ref}/trouble-shooting.html[Java agent troubleshooting]
|
||||
* {apm-node-ref}/troubleshooting.html[Node.js agent troubleshooting]
|
||||
* {apm-php-ref}/troubleshooting.html[PHP agent troubleshooting]
|
||||
|
|
|
@ -23,7 +23,7 @@ It is enabled by default.
|
|||
// Any changes made in this file will be seen there as well.
|
||||
// tag::apm-indices-settings[]
|
||||
|
||||
Index defaults can be changed in Kibana. Open the main menu, then click *APM > Settings > Indices*.
|
||||
Index defaults can be changed in the APM app. Select **Settings** > **Indices**.
|
||||
Index settings in the APM app take precedence over those set in `kibana.yml`.
|
||||
|
||||
[role="screenshot"]
|
||||
|
|