mirror of
https://github.com/elastic/kibana.git
synced 2025-04-23 17:28:26 -04:00
[Alerting][Docs] Moving alerting setup to its own page (#101323)
* Restructuring main alerting page. Adding separate setup page * Fixing links * Moving suppressing duplicate notifications section * Adding redirect * Reverting redirect. Adding placeholder link * Adding placeholder text * Apply suggestions from code review Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com> * Setup page PR fixes * Alerting page PR fixes * Update docs/user/alerting/alerting-setup.asciidoc Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com>
This commit is contained in:
parent
9417b699b8
commit
5a1f370580
9 changed files with 111 additions and 90 deletions
|
@ -126,4 +126,4 @@ See {kibana-ref}/alerting-getting-started.html[alerting and actions] for more in
|
|||
|
||||
NOTE: If you are using an **on-premise** Elastic Stack deployment with security,
|
||||
communication between Elasticsearch and Kibana must have TLS configured.
|
||||
More information is in the alerting {kibana-ref}/alerting-getting-started.html#alerting-setup-prerequisites[prerequisites].
|
||||
More information is in the alerting {kibana-ref}/alerting-setup.html#alerting-prerequisites[prerequisites].
|
|
@ -11,7 +11,7 @@ image::images/alerting-overview.png[Rules and Connectors UI]
|
|||
|
||||
[IMPORTANT]
|
||||
==============================================
|
||||
To make sure you can access alerting and actions, see the <<alerting-setup-prerequisites, setup and pre-requisites>> section.
|
||||
To make sure you can access alerting and actions, see the <<alerting-prerequisites, setup and pre-requisites>> section.
|
||||
==============================================
|
||||
|
||||
[float]
|
||||
|
@ -22,7 +22,7 @@ Actions typically involve interaction with {kib} services or third party integra
|
|||
This section describes all of these elements and how they operate together.
|
||||
|
||||
[float]
|
||||
=== What is a rule?
|
||||
=== Rules
|
||||
|
||||
A rule specifies a background task that runs on the {kib} server to check for specific conditions. It consists of three main parts:
|
||||
|
||||
|
@ -30,7 +30,10 @@ A rule specifies a background task that runs on the {kib} server to check for sp
|
|||
* *Schedule*: when/how often should detection checks run?
|
||||
* *Actions*: what happens when a condition is detected?
|
||||
|
||||
For example, when monitoring a set of servers, a rule might check for average CPU usage > 0.9 on each server for the last two minutes (condition), checked every minute (schedule), sending a warning email message via SMTP with subject `CPU on {{server}} is high` (action).
|
||||
For example, when monitoring a set of servers, a rule might:
|
||||
* Check for average CPU usage > 0.9 on each server for the last two minutes (condition).
|
||||
* Check every minute (schedule).
|
||||
* Send a warning email message via SMTP with subject `CPU on {{server}} is high` (action).
|
||||
|
||||
image::images/what-is-a-rule.svg[Three components of a rule]
|
||||
|
||||
|
@ -40,7 +43,7 @@ The following sections describe each part of the rule in more detail.
|
|||
[[alerting-concepts-conditions]]
|
||||
==== Conditions
|
||||
|
||||
Under the hood, {kib} rules detect conditions by running a javascript function on the {kib} server, which gives it the flexibility to support a wide range of conditions, anything from the results of a simple {es} query to heavy computations involving data from multiple sources or external systems.
|
||||
Under the hood, {kib} rules detect conditions by running a Javascript function on the {kib} server, which gives it the flexibility to support a wide range of conditions, anything from the results of a simple {es} query to heavy computations involving data from multiple sources or external systems.
|
||||
|
||||
These conditions are packaged and exposed as *rule types*. A rule type hides the underlying details of the condition, and exposes a set of parameters
|
||||
to control the details of the conditions to detect.
|
||||
|
@ -68,22 +71,22 @@ Actions are invocations of connectors, which allow interaction with {kib} servic
|
|||
|
||||
When defining actions in a rule, you specify:
|
||||
|
||||
* the *connector type*: the type of service or integration to use
|
||||
* the connection for that type by referencing a <<alerting-concepts-connectors, connector>>
|
||||
* a mapping of rule values to properties exposed for that type of action
|
||||
* The *connector type*: the type of service or integration to use
|
||||
* The connection for that type by referencing a <<alerting-concepts-connectors, connector>>
|
||||
* A mapping of rule values to properties exposed for that type of action
|
||||
|
||||
The result is a template: all the parameters needed to invoke a service are supplied except for specific values that are only known at the time the rule condition is detected.
|
||||
|
||||
In the server monitoring example, the `email` connector type is used, and `server` is mapped to the body of the email, using the template string `CPU on {{server}} is high`.
|
||||
|
||||
When the rule detects the condition, it creates an <<alerting-concepts-alert-instances, alert>> containing the details of the condition, renders the template with these details such as server name, and executes the action on the {kib} server by invoking the `email` connector type.
|
||||
When the rule detects the condition, it creates an <<alerting-concepts-alerts, alert>> containing the details of the condition, renders the template with these details such as server name, and executes the action on the {kib} server by invoking the `email` connector type.
|
||||
|
||||
image::images/what-is-an-action.svg[Actions are like templates that are rendered when an alert detects a condition]
|
||||
|
||||
See <<action-types>> for details on the types of connectors provided by {kib}.
|
||||
|
||||
[float]
|
||||
[[alerting-concepts-alert-instances]]
|
||||
[[alerting-concepts-alerts]]
|
||||
=== Alerts
|
||||
|
||||
When checking for a condition, a rule might identify multiple occurrences of the condition. {kib} tracks each of these *alerts* separately and takes an action per alert.
|
||||
|
@ -92,22 +95,6 @@ Using the server monitoring example, each server with average CPU > 0.9 is track
|
|||
|
||||
image::images/alerts.svg[{kib} tracks each detected condition as an alert and takes action on each alert]
|
||||
|
||||
[float]
|
||||
[[alerting-concepts-suppressing-duplicate-notifications]]
|
||||
=== Suppressing duplicate notifications
|
||||
|
||||
Since actions are executed per alert, a rule can end up generating a large number of actions. Take the following example where a rule is monitoring three servers every minute for CPU usage > 0.9:
|
||||
|
||||
* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
|
||||
* Minute 2: X123 and Y456 > 0.9. *Two emails* are sent, one for X123 and one for Y456.
|
||||
* Minute 3: X123, Y456, Z789 > 0.9. *Three emails* are sent, one for each of X123, Y456, Z789.
|
||||
|
||||
In the above example, three emails are sent for server X123 in the span of 3 minutes for the same rule. Often it's desirable to suppress frequent re-notification. Operations like muting and throttling can be applied at the alert level. If we set the rule re-notify interval to 5 minutes, we reduce noise by only getting emails for new servers that exceed the threshold:
|
||||
|
||||
* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
|
||||
* Minute 2: X123 and Y456 > 0.9. *One email* is sent for Y456.
|
||||
* Minute 3: X123, Y456, Z789 > 0.9. *One email* is sent for Z789.
|
||||
|
||||
[float]
|
||||
[[alerting-concepts-connectors]]
|
||||
=== Connectors
|
||||
|
@ -120,7 +107,7 @@ Rather than repeatedly entering connection information and credentials for each
|
|||
image::images/rule-concepts-connectors.svg[Connectors provide a central place to store service connection settings]
|
||||
|
||||
[float]
|
||||
=== Summary
|
||||
== Putting it all together
|
||||
|
||||
A *rule* consists of conditions, *actions*, and a schedule. When conditions are met, *alerts* are created that render *actions* and invoke them. To make action setup and update easier, actions use *connectors* that centralize the information used to connect with {kib} services and third-party integrations. The following example ties these concepts together:
|
||||
|
||||
|
@ -131,7 +118,6 @@ image::images/rule-concepts-summary.svg[Rules, connectors, alerts and actions wo
|
|||
. {kib} invokes the actions, sending them to a third party *integration* like an email service.
|
||||
. If the third party integration has connection parameters or credentials, {kib} will fetch these from the *connector* referenced in the action.
|
||||
|
||||
|
||||
[float]
|
||||
[[alerting-concepts-differences]]
|
||||
== Differences from Watcher
|
||||
|
@ -152,63 +138,7 @@ Pre-packaged *rule types* simplify setup and hide the details of complex, domain
|
|||
|
||||
[float]
|
||||
[[alerting-setup-prerequisites]]
|
||||
== Setup and prerequisites
|
||||
== Prerequisites
|
||||
<<alerting-prerequisites, Alerting prerequisites>>
|
||||
|
||||
If you are using an *on-premises* Elastic Stack deployment:
|
||||
|
||||
* In the kibana.yml configuration file, add the <<general-alert-action-settings,`xpack.encryptedSavedObjects.encryptionKey`>> setting.
|
||||
* For emails to have a footer with a link back to {kib}, set the <<server-publicBaseUrl, `server.publicBaseUrl`>> configuration setting.
|
||||
|
||||
If you are using an *on-premises* Elastic Stack deployment with <<using-kibana-with-security, *security*>>:
|
||||
|
||||
* You must enable Transport Layer Security (TLS) for communication <<configuring-tls-kib-es, between {es} and {kib}>>. {kib} alerting uses <<api-keys, API keys>> to secure background rule checks and actions, and API keys require {ref}/configuring-tls.html#tls-http[TLS on the HTTP interface]. A proxy will not suffice.
|
||||
|
||||
[float]
|
||||
[[alerting-setup-production]]
|
||||
== Production considerations and scaling guidance
|
||||
|
||||
When relying on alerting and actions as mission critical services, make sure you follow the <<alerting-production-considerations,Alerting production considerations>>.
|
||||
|
||||
See <<alerting-scaling-guidance>> for more information on the scalability of {kib} alerting.
|
||||
|
||||
[float]
|
||||
[[alerting-security]]
|
||||
== Security
|
||||
|
||||
To access alerting in a space, a user must have access to one of the following features:
|
||||
|
||||
* Alerting
|
||||
* <<xpack-apm,*APM*>>
|
||||
* <<logs-app,*Logs*>>
|
||||
* <<xpack-ml,*{ml-cap}*>>
|
||||
* <<metrics-app,*Metrics*>>
|
||||
* <<xpack-siem,*Security*>>
|
||||
* <<uptime-app,*Uptime*>>
|
||||
|
||||
See <<kibana-feature-privileges, feature privileges>> for more information on configuring roles that provide access to these features.
|
||||
Also note that a user will need +read+ privileges for the *Actions and Connectors* feature to attach actions to a rule or to edit a rule that has an action attached to it.
|
||||
|
||||
[float]
|
||||
[[alerting-spaces]]
|
||||
=== Space isolation
|
||||
|
||||
Rules and connectors are isolated to the {kib} space in which they were created. A rule or connector created in one space will not be visible in another.
|
||||
|
||||
[float]
|
||||
[[alerting-authorization]]
|
||||
=== Authorization
|
||||
|
||||
Rules, including all background detection and the actions they generate are authorized using an <<api-keys, API key>> associated with the last user to edit the rule. Upon creating or modifying a rule, an API key is generated for that user, capturing a snapshot of their privileges at that moment in time. The API key is then used to run all background tasks associated with the rule including detection checks and executing actions.
|
||||
|
||||
[IMPORTANT]
|
||||
==============================================
|
||||
If a rule requires certain privileges to run, such as index privileges, keep in mind that if a user without those privileges updates the rule, the rule will no longer function.
|
||||
==============================================
|
||||
|
||||
[float]
|
||||
[[alerting-restricting-actions]]
|
||||
=== Restricting actions
|
||||
|
||||
For security reasons you may wish to limit the extent to which {kib} can connect to external services. <<action-settings>> allows you to disable certain <<action-types>> and allowlist the hostnames that {kib} can connect with.
|
||||
|
||||
--
|
||||
--
|
68
docs/user/alerting/alerting-setup.asciidoc
Normal file
68
docs/user/alerting/alerting-setup.asciidoc
Normal file
|
@ -0,0 +1,68 @@
|
|||
[role="xpack"]
|
||||
[[alerting-setup]]
|
||||
== Alerting Setup
|
||||
++++
|
||||
<titleabbrev>Setup</titleabbrev>
|
||||
++++
|
||||
|
||||
The Alerting feature is automatically enabled in {kib}, but might require some additional configuration.
|
||||
|
||||
[float]
|
||||
[[alerting-prerequisites]]
|
||||
=== Prerequisites
|
||||
If you are using an *on-premises* Elastic Stack deployment:
|
||||
|
||||
* In the kibana.yml configuration file, add the <<general-alert-action-settings,`xpack.encryptedSavedObjects.encryptionKey`>> setting.
|
||||
* For emails to have a footer with a link back to {kib}, set the <<server-publicBaseUrl, `server.publicBaseUrl`>> configuration setting.
|
||||
|
||||
If you are using an *on-premises* Elastic Stack deployment with <<using-kibana-with-security, *security*>>:
|
||||
|
||||
* You must enable Transport Layer Security (TLS) for communication <<configuring-tls-kib-es, between {es} and {kib}>>. {kib} alerting uses <<api-keys, API keys>> to secure background rule checks and actions, and API keys require {ref}/configuring-tls.html#tls-http[TLS on the HTTP interface]. A proxy will not suffice.
|
||||
|
||||
[float]
|
||||
[[alerting-setup-production]]
|
||||
=== Production considerations and scaling guidance
|
||||
|
||||
When relying on alerting and actions as mission critical services, make sure you follow the <<alerting-production-considerations,Alerting production considerations>>.
|
||||
|
||||
See <<alerting-scaling-guidance>> for more information on the scalability of {kib} alerting.
|
||||
|
||||
[float]
|
||||
[[alerting-security]]
|
||||
=== Security
|
||||
|
||||
To access alerting in a space, a user must have access to one of the following features:
|
||||
|
||||
* Alerting
|
||||
* <<xpack-apm,*APM*>>
|
||||
* <<logs-app,*Logs*>>
|
||||
* <<xpack-ml,*{ml-cap}*>>
|
||||
* <<metrics-app,*Metrics*>>
|
||||
* <<xpack-siem,*Security*>>
|
||||
* <<uptime-app,*Uptime*>>
|
||||
|
||||
See <<kibana-feature-privileges, feature privileges>> for more information on configuring roles that provide access to these features.
|
||||
Also note that a user will need +read+ privileges for the *Actions and Connectors* feature to attach actions to a rule or to edit a rule that has an action attached to it.
|
||||
|
||||
[float]
|
||||
[[alerting-restricting-actions]]
|
||||
==== Restrict actions
|
||||
|
||||
For security reasons you may wish to limit the extent to which {kib} can connect to external services. <<action-settings>> allows you to disable certain <<action-types>> and allowlist the hostnames that {kib} can connect with.
|
||||
|
||||
[float]
|
||||
[[alerting-spaces]]
|
||||
=== Space isolation
|
||||
|
||||
Rules and connectors are isolated to the {kib} space in which they were created. A rule or connector created in one space will not be visible in another.
|
||||
|
||||
[float]
|
||||
[[alerting-authorization]]
|
||||
=== Authorization
|
||||
|
||||
Rules, including all background detection and the actions they generate are authorized using an <<api-keys, API key>> associated with the last user to edit the rule. Upon creating or modifying a rule, an API key is generated for that user, capturing a snapshot of their privileges at that moment in time. The API key is then used to run all background tasks associated with the rule including detection checks and executing actions.
|
||||
|
||||
[IMPORTANT]
|
||||
==============================================
|
||||
If a rule requires certain privileges to run, such as index privileges, keep in mind that if a user without those privileges updates the rule, the rule will no longer function.
|
||||
==============================================
|
|
@ -1,6 +1,9 @@
|
|||
[role="xpack"]
|
||||
[[alerting-troubleshooting]]
|
||||
== Alerting Troubleshooting
|
||||
++++
|
||||
<titleabbrev>Troubleshooting</titleabbrev>
|
||||
++++
|
||||
|
||||
This page describes how to resolve common problems you might encounter with Alerting.
|
||||
If your problem isn’t described here, please review open issues in the following GitHub repositories:
|
||||
|
|
|
@ -32,6 +32,25 @@ Notify:: This value limits how often actions are repeated when an alert rem
|
|||
- **Every time alert is active**: Actions are repeated when an alert remains active across checks.
|
||||
- **On a custom action interval**: Actions are suppressed for the throttle interval, but repeat when an alert remains active across checks for a duration longer than the throttle interval.
|
||||
|
||||
[float]
|
||||
[[alerting-concepts-suppressing-duplicate-notifications]]
|
||||
[NOTE]
|
||||
==============================================
|
||||
Since actions are executed per alert, a rule can end up generating a large number of actions. Take the following example where a rule is monitoring three servers every minute for CPU usage > 0.9, and the rule is set to notify **Every time alert is active**:
|
||||
|
||||
* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
|
||||
* Minute 2: X123 and Y456 > 0.9. *Two emails* are sent, one for X123 and one for Y456.
|
||||
* Minute 3: X123, Y456, Z789 > 0.9. *Three emails* are sent, one for each of X123, Y456, Z789.
|
||||
|
||||
In the above example, three emails are sent for server X123 in the span of 3 minutes for the same rule. Often, it's desirable to suppress these re-notifications. If you set the rule **Notify** setting to **On a custom action interval** with an interval of 5 minutes, you reduce noise by only getting emails every 5 minutes for servers that continue to exceed the threshold:
|
||||
|
||||
* Minute 1: server X123 > 0.9. *One email* is sent for server X123.
|
||||
* Minute 2: X123 and Y456 > 0.9. *One email* is sent for Y456.
|
||||
* Minute 3: X123, Y456, Z789 > 0.9. *One email* is sent for Z789.
|
||||
|
||||
To get notified **only once** when a server exceeds the threshold, you can set the rule's **Notify** setting to **Only on status change**.
|
||||
==============================================
|
||||
|
||||
|
||||
[float]
|
||||
[[defining-alerts-type-conditions]]
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
include::alerting-getting-started.asciidoc[]
|
||||
include::alerting-setup.asciidoc[]
|
||||
include::defining-rules.asciidoc[]
|
||||
include::rule-management.asciidoc[]
|
||||
include::rule-details.asciidoc[]
|
||||
|
|
|
@ -19,7 +19,7 @@ image::user/alerting/images/rule-types-index-threshold-conditions.png[Five claus
|
|||
|
||||
Index:: This clause requires an *index or index pattern* and a *time field* that will be used for the *time window*.
|
||||
When:: This clause specifies how the value to be compared to the threshold is calculated. The value is calculated by aggregating a numeric field a the *time window*. The aggregation options are: `count`, `average`, `sum`, `min`, and `max`. When using `count` the document count is used, and an aggregation field is not necessary.
|
||||
Over/Grouped Over:: This clause lets you configure whether the aggregation is applied over all documents, or should be split into groups using a grouping field. If grouping is used, an <<alerting-concepts-alert-instances, alert>> will be created for each group when it exceeds the threshold. To limit the number of alerts on high cardinality fields, you must specify the number of groups to check against the threshold. Only the *top* groups are checked.
|
||||
Over/Grouped Over:: This clause lets you configure whether the aggregation is applied over all documents, or should be split into groups using a grouping field. If grouping is used, an <<alerting-concepts-alerts, alert>> will be created for each group when it exceeds the threshold. To limit the number of alerts on high cardinality fields, you must specify the number of groups to check against the threshold. Only the *top* groups are checked.
|
||||
Threshold:: This clause defines a threshold value and a comparison operator (one of `is above`, `is above or equals`, `is below`, `is below or equals`, or `is between`). The result of the aggregation is compared to this threshold.
|
||||
Time window:: This clause determines how far back to search for documents, using the *time field* set in the *index* clause. Generally this value should be to a value higher than the *check every* value in the <<defining-alerts-general-details, general rule details>>, to avoid gaps in detection.
|
||||
|
||||
|
|
|
@ -265,7 +265,7 @@ export class DocLinksService {
|
|||
preconfiguredConnectors: `${KIBANA_DOCS}pre-configured-connectors.html`,
|
||||
preconfiguredAlertHistoryConnector: `${KIBANA_DOCS}index-action-type.html#preconfigured-connector-alert-history`,
|
||||
serviceNowAction: `${KIBANA_DOCS}servicenow-action-type.html#configuring-servicenow`,
|
||||
setupPrerequisites: `${KIBANA_DOCS}alerting-getting-started.html#alerting-setup-prerequisites`,
|
||||
setupPrerequisites: `${KIBANA_DOCS}alerting-setup.html#alerting-prerequisites`,
|
||||
slackAction: `${KIBANA_DOCS}slack-action-type.html#configuring-slack`,
|
||||
teamsAction: `${KIBANA_DOCS}teams-action-type.html#configuring-teams`,
|
||||
},
|
||||
|
|
|
@ -184,7 +184,7 @@ describe('health check', () => {
|
|||
const action = queryByText(/Learn/i);
|
||||
expect(action!.textContent).toMatchInlineSnapshot(`"Learn how.(opens in a new tab or window)"`);
|
||||
expect(action!.getAttribute('href')).toMatchInlineSnapshot(
|
||||
`"https://www.elastic.co/guide/en/kibana/mocked-test-branch/alerting-getting-started.html#alerting-setup-prerequisites"`
|
||||
`"https://www.elastic.co/guide/en/kibana/mocked-test-branch/alerting-setup.html#alerting-prerequisites"`
|
||||
);
|
||||
});
|
||||
});
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue