[DOCS] Add alert summaries to overview (#151817)
|
@ -3,7 +3,7 @@
|
|||
|
||||
--
|
||||
|
||||
Alerting allows you to define _rules_ to detect complex conditions within different {kib} apps and trigger actions when those conditions are met. Alerting is integrated with {observability-guide}/create-alerts.html[*Observability*], {security-guide}/prebuilt-rules.html[*Security*], <<geo-alerting,*Maps*>> and {ml-docs}/ml-configuring-alerts.html[*{ml-app}*], can be centrally managed from the <<management,*Management*>> UI, and provides a set of built-in <<action-types,connectors>> and <<stack-rules,rules>> (known as stack rules) for you to use.
|
||||
Alerting enables you to define _rules_, which detect complex conditions within different {kib} apps and trigger actions when those conditions are met. Alerting is integrated with {observability-guide}/create-alerts.html[*{observability}*], {security-guide}/prebuilt-rules.html[*Security*], <<geo-alerting,*Maps*>> and {ml-docs}/ml-configuring-alerts.html[*{ml-app}*]. It can be centrally managed from *{stack-manage-app}* and provides a set of built-in <<action-types,connectors>> and <<stack-rules,rules>> for you to use.
|
||||
|
||||
image::images/alerting-overview.png[{rules-ui} UI]
|
||||
|
||||
|
@ -12,15 +12,12 @@ image::images/alerting-overview.png[{rules-ui} UI]
|
|||
To make sure you can access alerting and actions, see the <<alerting-prerequisites,setup and prerequisites>> section.
|
||||
==============================================
|
||||
|
||||
[float]
|
||||
== Concepts and terminology
|
||||
|
||||
Alerting works by running checks on a schedule to detect conditions defined by a rule. When a condition is met, the rule tracks it as an _alert_ and responds by triggering one or more _actions_.
|
||||
Actions typically involve interaction with {kib} services or third party integrations. _Connectors_ allow actions to talk to these services and integrations.
|
||||
Actions typically involve interaction with {kib} services or third party integrations. _Connectors_ enable actions to talk to these services and integrations.
|
||||
This section describes all of these elements and how they operate together.
|
||||
|
||||
[float]
|
||||
=== Rules
|
||||
== Rules
|
||||
|
||||
A rule specifies a background task that runs on the {kib} server to check for specific conditions. {kib} provides two types of rules: stack rules that are built into {kib} and the rules that are registered by {kib} apps. For more information, refer to <<rule-types>>.
|
||||
|
||||
|
@ -42,7 +39,7 @@ The following sections describe each part of the rule in more detail.
|
|||
|
||||
[float]
|
||||
[[alerting-concepts-conditions]]
|
||||
==== Conditions
|
||||
=== Conditions
|
||||
|
||||
Under the hood, {kib} rules detect conditions by running a JavaScript function on the {kib} server, which gives it the flexibility to support a wide range of conditions, anything from the results of a simple {es} query to heavy computations involving data from multiple sources or external systems.
|
||||
|
||||
|
@ -55,58 +52,47 @@ See <<rule-types>> for the rules provided by {kib} and how they express their co
|
|||
|
||||
[float]
|
||||
[[alerting-concepts-scheduling]]
|
||||
==== Schedule
|
||||
=== Schedule
|
||||
|
||||
Rule schedules are defined as an interval between subsequent checks, and can range from a few seconds to months.
|
||||
|
||||
[IMPORTANT]
|
||||
==============================================
|
||||
The intervals of rule checks in {kib} are approximate. Their timing is affected by factors such as the frequency at which tasks are claimed and the task load on the system. Refer to <<alerting-production-considerations>> for more information.
|
||||
The intervals of rule checks in {kib} are approximate. Their timing is affected by factors such as the frequency at which tasks are claimed and the task load on the system. Refer to <<alerting-production-considerations,Alerting production considerations>> for more information.
|
||||
==============================================
|
||||
|
||||
[float]
|
||||
[[alerting-concepts-actions]]
|
||||
==== Actions
|
||||
=== Actions
|
||||
|
||||
Actions are invocations of connectors, which allow interaction with {kib} services or integrations with third-party systems. Actions run as background tasks on the {kib} server when rule conditions are met.
|
||||
Actions run as background tasks on the {kib} server when rule conditions are met. Recovery actions likewise run when rule conditions are no longer met. They send notifications by connecting with services inside {kib} or integrating with third-party systems.
|
||||
|
||||
When defining actions in a rule, you specify:
|
||||
|
||||
* The _connector type_: the type of service or integration to use
|
||||
* The connection for that type by referencing a <<alerting-concepts-connectors,connector>>
|
||||
* A connector
|
||||
* An action frequency
|
||||
* A mapping of rule values to properties exposed for that type of action
|
||||
|
||||
The result is a template: all the parameters needed to invoke a service are supplied except for specific values that are only known at the time the rule condition is detected.
|
||||
Rather than repeatedly entering connection information and credentials for each action, {kib} simplifies action setup using <<action-types,connectors>>. For example if four rules send email notifications via the same SMTP service, they can all reference the same SMTP connector.
|
||||
|
||||
The _action frequency_ defines when the action runs (for example, only when the alert status changes or at specific time intervals). Each rule type also has a set of the _action groups_ that affects when the action runs (for example, when the threshold is met or when the alert is recovered). If you want to reduce the number of notifications you receive without affecting their timeliness, some rule types support alert summaries. You can set the action frequency such that you receive notifications that summarize the new, ongoing, and recovered alerts at your preferred time intervals.
|
||||
|
||||
Each action definition is therefore a template: all the parameters needed to invoke a service are supplied except for specific values that are only known at the time the rule condition is detected.
|
||||
|
||||
In the server monitoring example, the `email` connector type is used, and `server` is mapped to the body of the email, using the template string `CPU on {{server}} is high`.
|
||||
|
||||
When the rule detects the condition, it creates an <<alerting-concepts-alerts,alert>> containing the details of the condition, renders the template with these details such as server name, and runs the action on the {kib} server by invoking the `email` connector type.
|
||||
|
||||
image::images/what-is-an-action.svg[Actions are like templates that are rendered when an alert detects a condition]
|
||||
|
||||
See <<action-types>> for details on the types of connectors provided by {kib}.
|
||||
When the rule detects the condition, it creates an alert containing the details of the condition.
|
||||
|
||||
[float]
|
||||
[[alerting-concepts-alerts]]
|
||||
=== Alerts
|
||||
== Alerts
|
||||
|
||||
When checking for a condition, a rule might identify multiple occurrences of the condition. {kib} tracks each of these *alerts* separately and takes an action per alert.
|
||||
When checking for a condition, a rule might identify multiple occurrences of the condition. {kib} tracks each of these alerts separately. Depending on the action frequency, an action occurs per alert or at the specified alert summary interval.
|
||||
|
||||
Using the server monitoring example, each server with average CPU > 0.9 is tracked as an alert. This means a separate email is sent for each server that exceeds the threshold.
|
||||
Using the server monitoring example, each server with average CPU > 0.9 is tracked as an alert. This means a separate email is sent for each server that exceeds the threshold whenever the alert status changes.
|
||||
|
||||
image::images/alerts.svg[{kib} tracks each detected condition as an alert and takes action on each alert]
|
||||
|
||||
[float]
|
||||
[[alerting-concepts-connectors]]
|
||||
=== Connectors
|
||||
|
||||
Actions often involve connecting with services inside {kib} or integrating with third-party systems.
|
||||
Rather than repeatedly entering connection information and credentials for each action, {kib} simplifies action setup using connectors.
|
||||
|
||||
Connectors provide a central place to store connection information for services and integrations. For example if four rules send email notifications via the same SMTP service, they can all reference the same SMTP connector. When the SMTP settings change, you can update them once in the connector, instead of having to update four rules.
|
||||
|
||||
image::images/rule-concepts-connectors.svg[Connectors provide a central place to store service connection settings]
|
||||
|
||||
[float]
|
||||
== Putting it all together
|
||||
|
||||
|
@ -114,10 +100,10 @@ A rule consists of conditions, actions, and a schedule. When conditions are met,
|
|||
|
||||
image::images/rule-concepts-summary.svg[Rules, connectors, alerts and actions work together to convert detection into action]
|
||||
|
||||
. Anytime a rule's conditions are met, an alert is created. This example checks for servers with average CPU > 0.9. Three servers meet the condition, so three alerts are created.
|
||||
. Alerts create actions as long as they are not muted or throttled. When actions are created, the template that was setup in the rule is filled with actual values. In this example, three actions are created, and the template string {{server}} is replaced with the server name for each alert.
|
||||
. {kib} invokes the actions, sending them to a third party integration like an email service.
|
||||
. If the third party integration has connection parameters or credentials, {kib} will fetch these from the connector referenced in the action.
|
||||
. Any time a rule's conditions are met, an alert is created. This example checks for servers with average CPU > 0.9. Three servers meet the condition, so three alerts are created.
|
||||
. Alerts create actions according to the action frequency, as long as they are not muted or throttled. When actions are created, its properties are filled with actual values. In this example, three actions are created when the threshold is met, and the template string {{server}} is replaced with the appropriate server name for each alert.
|
||||
. {kib} runs the actions, sending notifications by using a third party integration like an email service.
|
||||
. If the third party integration has connection parameters or credentials, {kib} fetches these from the appropriate connector.
|
||||
|
||||
[float]
|
||||
[[alerting-concepts-differences]]
|
||||
|
@ -135,7 +121,7 @@ Functionally, the {alert-features} differ in that:
|
|||
* Scheduled checks are run on {kib} instead of {es}
|
||||
* {kib} <<alerting-concepts-conditions,rules hide the details of detecting conditions>> through rule types, whereas watches provide low-level control over inputs, conditions, and transformations.
|
||||
* {kib} rules track and persist the state of each detected condition through alerts. This makes it possible to mute and throttle individual alerts, and detect changes in state such as resolution.
|
||||
* Actions are linked to alerts in Alerting. Actions are fired for each occurrence of a detected condition, rather than for the entire rule.
|
||||
* Actions are linked to alerts. Actions are fired for each occurrence of a detected condition, rather than for the entire rule.
|
||||
|
||||
At a higher level, the {alert-features} allow rich integrations across use cases like <<xpack-apm,*APM*>>, <<metrics-app,*Metrics*>>, <<xpack-siem,*Security*>>, and <<uptime-app,*Uptime*>>.
|
||||
Prepackaged rule types simplify setup and hide the details of complex, domain-specific detections, while providing a consistent interface across {kib}.
|
||||
|
|
|
@ -79,21 +79,21 @@ Each connector enables different action properties. For example, an email connec
|
|||
[[alerting-concepts-suppressing-duplicate-notifications]]
|
||||
[TIP]
|
||||
==============================================
|
||||
If you are not using alert summaries, actions are triggered per alert and a rule can end up generating a large number of actions. Take the following example where a rule is monitoring three servers every minute for CPU usage > 0.9, and the rule is set to notify `On check intervals`:
|
||||
If you are not using alert summaries, actions are triggered per alert and a rule can end up generating a large number of actions. Take the following example where a rule is monitoring three servers every minute for CPU usage > 0.9, and the action frequency is `On check intervals`:
|
||||
|
||||
* Minute 1: server X123 > 0.9. _One email_ is sent for server X123.
|
||||
* Minute 2: X123 and Y456 > 0.9. _Two emails_ are sent, one for X123 and one for Y456.
|
||||
* Minute 3: X123, Y456, Z789 > 0.9. _Three emails_ are sent, one for each of X123, Y456, Z789.
|
||||
|
||||
In this example, three emails are sent for server X123 in the span of 3 minutes for the same rule. Often, it's desirable to suppress these re-notifications. If
|
||||
you set the rule notify setting to `On custom action intervals` with an interval of 5 minutes, you reduce noise by getting emails only every 5 minutes for
|
||||
you set the action frequency to `On custom action intervals` with an interval of 5 minutes, you reduce noise by getting emails only every 5 minutes for
|
||||
servers that continue to exceed the threshold:
|
||||
|
||||
* Minute 1: server X123 > 0.9. _One email_ is sent for server X123.
|
||||
* Minute 2: X123 and Y456 > 0.9. _One email_ is sent for Y456.
|
||||
* Minute 3: X123, Y456, Z789 > 0.9. _One email_ is sent for Z789.
|
||||
* Minute 1: server X123 > 0.9. _One email_ will be sent for server X123.
|
||||
* Minute 2: X123 and Y456 > 0.9. _One email_ will be sent for Y456.
|
||||
* Minute 3: X123, Y456, Z789 > 0.9. _One email_ will be sent for Z789.
|
||||
|
||||
To get notified only once when a server exceeds the threshold, you can set the rule notify setting to `On status changes`.
|
||||
To get notified only once when a server exceeds the threshold, you can set the action frequency to `On status changes`. Alternatively, if the rule type supports alert summaries, consider using them to reduce the volume of notifications.
|
||||
==============================================
|
||||
|
||||
[float]
|
||||
|
|
Before Width: | Height: | Size: 122 KiB After Width: | Height: | Size: 312 KiB |
Before Width: | Height: | Size: 154 KiB |
Before Width: | Height: | Size: 189 KiB After Width: | Height: | Size: 291 KiB |
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 182 KiB |
Before Width: | Height: | Size: 223 KiB |