[Synthetics] fix alert actions for synthetics monitor status rule (#151900)

## Summary

Resolves https://github.com/elastic/kibana/issues/151796
Resolves https://github.com/elastic/kibana/issues/151795

The index alert connector is shared by both legacy uptime and
synthetics, but the context shape is different between the two alert
types. To account for this a `isLegacy` boolean was added that
determines which index connector configuration is used.

Additionally, the recovery message was missing when the monitor resolves
due to status up. This PR ensures that a recovery message is still
populated when the monitor recovers after coming up.

Example server log for down monitor
```
[2023-02-22T13:46:04.521-05:00][WARN ][plugins.actions.server-log] Server log: The monitor Badssl checking https://expired.badssl.com/ from US Central QA last ran at February 22, 2023 1:45 PM and is down. The last error received is: error executing step: page.goto: net::ERR_CERT_DATE_INVALID at https://expired.badssl.com/.
```
Example slack message for down monitor
<img width="1428" alt="Screen Shot 2023-02-22 at 1 47 57 PM"
src="https://user-images.githubusercontent.com/11356435/220729032-a15341cd-3123-4c28-a32e-e98ccf5dfaa8.png">

Example 

Example server log for recovery
```
[2023-02-22T13:16:45.058-05:00][INFO ][plugins.actions.server-log] Server log: The alert for the monitor Badssl checking https://expired.badssl.com/ from North America - US Central is no longer active: Monitor has recovered with status Up.
```

Example slack message for recovery
<img width="1148" alt="Screen Shot 2023-02-22 at 1 18 40 PM"
src="https://user-images.githubusercontent.com/11356435/220720380-8342af47-5de6-49fe-a58b-5bf4df460093.png">

Example index document for recovery
```
{
        "_index": "test3",
        "_id": "WnZieoYBn2Xs-NzyH0X7",
        "_score": 1,
        "_source": {
          "monitorName": "https://elastic.co",
          "monitorUrl": "https://elastic.co/",
          "statusMessage": "up",
          "latestErrorMessage": "",
          "observerLocation": "North America - US Central",
          "recoveryReason": "Monitor has recovered with status Up"
        }
      }
```

```
{
        "_index": "test3",
        "_id": "z1RseoYBfB3Am9NRsxsx",
        "_score": 1,
        "_source": {
          "monitorName": "Beep",
          "monitorUrl": "about:blank",
          "statusMessage": "down",
          "latestErrorMessage": "",
          "observerLocation": "North America - US Central",
          "recoveryReason": "Location has been removed from the monitor"
        }
      }
```

```
{
        "_index": "test3",
        "_id": "rVRreoYBfB3Am9NRvRsu",
        "_score": 1,
        "_source": {
          "monitorName": "TCP monitor",
          "monitorUrl": "tcp://localhost:5601",
          "statusMessage": "down",
          "latestErrorMessage": "",
          "observerLocation": "North America - US Central",
          "recoveryReason": "Monitor has been deleted"
        }
      },
```

Testing
--

1. Create a default alert connector on the Synthetics settings page.
This issue in particular impacted the `index` connector, so it's
recommended to create an index connector. For smoke testing, it's also
good to create an server log connector, or a slack connector.
2. Create an always down monitor, and ideally run it against two
locations (if using the dev service environment, you can use US Central
dev and US Central QA).
3. Wait for the alert to fire
4. Confirm that all your alert connectors are working with all content
populated. In particular, query against the index that you are using for
the index connector. Confirm that all the fields are populated
appropriately in the document.
5. Remove one of the locations from your monitor
6. Wait for the alert for that location to resolve. Confirm that the
recovery message for that location was sent to all your connectors. In
particular, query against the index that you are using for the index
connector. Find the recovery document. Ensure that all the fields are
populated appropriately
7. Update the monitor so it now fires as up
8. Wait for the alert to resolve. Confirm that the recovery message for
the remaining location was sent to all your alert connectors. In
particular, query against the index that you are using for the index
connector. Find the recovery document. Ensure that the reason message
mentions that the monitor is now up and the status is marked as up.
9. Update the monitor so that it becomes down again
10. Wait for the alert to fire
11. After the alert fires for the down monitor, delete the monitor
12. Wait for the alert to resolve. Confirm that the recovery message was
sent to all your alert connectors stating that the alert was resolved
because the monitor was deleted. In particular, query against the index
that you are using for the index connector. Find the recovery document.
Ensure that the reason message mentions that the monitor was deleted.

---------

Co-authored-by: Shahzad <shahzad31comp@gmail.com>
This commit is contained in:
Dominique Clarke 2023-02-28 07:58:06 -05:00 committed by GitHub
parent 0af4589862
commit b1ba7e1bb6
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
6 changed files with 430 additions and 11 deletions

View file

@ -8,9 +8,11 @@
import { populateAlertActions } from './alert_actions';
import { ActionConnector } from './types';
import { MONITOR_STATUS } from '../constants/uptime_alerts';
import { MONITOR_STATUS as SYNTHETICS_MONITOR_STATUS } from '../constants/synthetics_alerts';
import { MonitorStatusTranslations } from '../translations';
import { SyntheticsMonitorStatusTranslations } from './synthetics/translations';
describe('Alert Actions factory', () => {
describe('Legacy Alert Actions factory', () => {
it('generate expected action for pager duty', async () => {
const resp = populateAlertActions({
groupId: MONITOR_STATUS.id,
@ -32,6 +34,7 @@ describe('Alert Actions factory', () => {
defaultRecoveryMessage: MonitorStatusTranslations.defaultRecoveryMessage,
defaultSubjectMessage: MonitorStatusTranslations.defaultSubjectMessage,
},
isLegacy: true,
});
expect(resp).toEqual([
{
@ -57,6 +60,66 @@ describe('Alert Actions factory', () => {
]);
});
it('generate expected action for index', async () => {
const resp = populateAlertActions({
groupId: MONITOR_STATUS.id,
defaultActions: [
{
actionTypeId: '.index',
group: 'xpack.uptime.alerts.actionGroups.monitorStatus',
params: {
dedupKey: 'always-downxpack.uptime.alerts.actionGroups.monitorStatus',
eventAction: 'trigger',
severity: 'error',
summary: MonitorStatusTranslations.defaultActionMessage,
},
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
},
] as unknown as ActionConnector[],
translations: {
defaultActionMessage: MonitorStatusTranslations.defaultActionMessage,
defaultRecoveryMessage: MonitorStatusTranslations.defaultRecoveryMessage,
defaultSubjectMessage: MonitorStatusTranslations.defaultSubjectMessage,
},
isLegacy: true,
});
expect(resp).toEqual([
{
group: 'recovered',
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
params: {
documents: [
{
latestErrorMessage: '',
monitorName: '{{context.monitorName}}',
monitorUrl: '{{{context.monitorUrl}}}',
observerLocation: '{{context.observerLocation}}',
statusMessage:
'Alert for monitor {{context.monitorName}} with url {{{context.monitorUrl}}} from {{context.observerLocation}} has recovered',
},
],
indexOverride: null,
},
},
{
group: 'xpack.uptime.alerts.actionGroups.monitorStatus',
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
params: {
documents: [
{
latestErrorMessage: '{{{context.latestErrorMessage}}}',
monitorName: '{{context.monitorName}}',
monitorUrl: '{{{context.monitorUrl}}}',
observerLocation: '{{context.observerLocation}}',
statusMessage: '{{{context.statusMessage}}}',
},
],
indexOverride: null,
},
},
]);
});
it('generate expected action for slack action connector', async () => {
const resp = populateAlertActions({
groupId: MONITOR_STATUS.id,
@ -104,3 +167,157 @@ describe('Alert Actions factory', () => {
]);
});
});
describe('Alert Actions factory', () => {
it('generate expected action for pager duty', async () => {
const resp = populateAlertActions({
groupId: SYNTHETICS_MONITOR_STATUS.id,
defaultActions: [
{
actionTypeId: '.pagerduty',
group: 'xpack.uptime.alerts.actionGroups.monitorStatus',
params: {
dedupKey: 'always-downxpack.uptime.alerts.actionGroups.monitorStatus',
eventAction: 'trigger',
severity: 'error',
summary: SyntheticsMonitorStatusTranslations.defaultActionMessage,
},
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
},
] as unknown as ActionConnector[],
translations: {
defaultActionMessage: SyntheticsMonitorStatusTranslations.defaultActionMessage,
defaultRecoveryMessage: SyntheticsMonitorStatusTranslations.defaultRecoveryMessage,
defaultSubjectMessage: SyntheticsMonitorStatusTranslations.defaultSubjectMessage,
},
});
expect(resp).toEqual([
{
group: 'recovered',
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
params: {
dedupKey: expect.any(String),
eventAction: 'resolve',
summary:
'The alert for the monitor {{context.monitorName}} checking {{{context.monitorUrl}}} from {{context.locationName}} is no longer active: {{context.recoveryReason}}.',
},
},
{
group: 'xpack.synthetics.alerts.actionGroups.monitorStatus',
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
params: {
dedupKey: expect.any(String),
eventAction: 'trigger',
severity: 'error',
summary: SyntheticsMonitorStatusTranslations.defaultActionMessage,
},
},
]);
});
it('generate expected action for index', async () => {
const resp = populateAlertActions({
groupId: SYNTHETICS_MONITOR_STATUS.id,
defaultActions: [
{
actionTypeId: '.index',
group: 'xpack.synthetics.alerts.actionGroups.monitorStatus',
params: {
dedupKey: 'always-downxpack.uptime.alerts.actionGroups.monitorStatus',
eventAction: 'trigger',
severity: 'error',
summary: SyntheticsMonitorStatusTranslations.defaultActionMessage,
},
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
},
] as unknown as ActionConnector[],
translations: {
defaultActionMessage: SyntheticsMonitorStatusTranslations.defaultActionMessage,
defaultRecoveryMessage: SyntheticsMonitorStatusTranslations.defaultRecoveryMessage,
defaultSubjectMessage: SyntheticsMonitorStatusTranslations.defaultSubjectMessage,
},
});
expect(resp).toEqual([
{
group: 'recovered',
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
params: {
documents: [
{
latestErrorMessage: '{{{context.latestErrorMessage}}}',
monitorName: '{{context.monitorName}}',
monitorUrl: '{{{context.monitorUrl}}}',
observerLocation: '{{context.locationName}}',
statusMessage: '{{{context.status}}}',
recoveryReason: '{{context.recoveryReason}}',
},
],
indexOverride: null,
},
},
{
group: 'xpack.synthetics.alerts.actionGroups.monitorStatus',
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
params: {
documents: [
{
latestErrorMessage: '{{{context.lastErrorMessage}}}',
monitorName: '{{context.monitorName}}',
monitorUrl: '{{{context.monitorUrl}}}',
observerLocation: '{{context.locationName}}',
statusMessage: '{{{context.status}}}',
},
],
indexOverride: null,
},
},
]);
});
it('generate expected action for slack action connector', async () => {
const resp = populateAlertActions({
groupId: SYNTHETICS_MONITOR_STATUS.id,
defaultActions: [
{
actionTypeId: '.pagerduty',
group: 'xpack.synthetics.alerts.actionGroups.monitorStatus',
params: {
dedupKey: 'always-downxpack.uptime.alerts.actionGroups.monitorStatus',
eventAction: 'trigger',
severity: 'error',
summary:
'Monitor {{context.monitorName}} with url {{{context.monitorUrl}}} from {{context.observerLocation}} {{{context.statusMessage}}} The latest error message is {{{context.latestErrorMessage}}}',
},
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
},
] as unknown as ActionConnector[],
translations: {
defaultActionMessage: SyntheticsMonitorStatusTranslations.defaultActionMessage,
defaultRecoveryMessage: SyntheticsMonitorStatusTranslations.defaultRecoveryMessage,
defaultSubjectMessage: SyntheticsMonitorStatusTranslations.defaultSubjectMessage,
},
});
expect(resp).toEqual([
{
group: 'recovered',
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
params: {
dedupKey: expect.any(String),
eventAction: 'resolve',
summary:
'The alert for the monitor {{context.monitorName}} checking {{{context.monitorUrl}}} from {{context.locationName}} is no longer active: {{context.recoveryReason}}.',
},
},
{
group: 'xpack.synthetics.alerts.actionGroups.monitorStatus',
id: 'f2a3b195-ed76-499a-805d-82d24d4eeba9',
params: {
dedupKey: expect.any(String),
eventAction: 'trigger',
severity: 'error',
summary: SyntheticsMonitorStatusTranslations.defaultActionMessage,
},
},
]);
});
});

View file

@ -43,11 +43,13 @@ export function populateAlertActions({
defaultEmail,
groupId,
translations,
isLegacy = false,
}: {
groupId: string;
defaultActions: ActionConnector[];
defaultEmail?: DefaultEmail;
translations: Translations;
isLegacy?: boolean;
}) {
const actions: RuleAction[] = [];
defaultActions.forEach((aId) => {
@ -78,8 +80,8 @@ export function populateAlertActions({
actions.push(recoveredAction);
break;
case INDEX_ACTION_ID:
action.params = getIndexActionParams(translations);
recoveredAction.params = getIndexActionParams(translations, true);
action.params = getIndexActionParams(translations, false, isLegacy);
recoveredAction.params = getIndexActionParams(translations, true, isLegacy);
actions.push(recoveredAction);
break;
case SERVICE_NOW_ACTION_ID:
@ -119,8 +121,12 @@ export function populateAlertActions({
return actions;
}
function getIndexActionParams(translations: Translations, recovery = false): IndexActionParams {
if (recovery) {
function getIndexActionParams(
translations: Translations,
recovery = false,
isLegacy = false
): IndexActionParams {
if (isLegacy && recovery) {
return {
documents: [
{
@ -134,14 +140,45 @@ function getIndexActionParams(translations: Translations, recovery = false): Ind
indexOverride: null,
};
}
if (isLegacy) {
return {
documents: [
{
monitorName: '{{context.monitorName}}',
monitorUrl: '{{{context.monitorUrl}}}',
statusMessage: '{{{context.statusMessage}}}',
latestErrorMessage: '{{{context.latestErrorMessage}}}',
observerLocation: '{{context.observerLocation}}',
},
],
indexOverride: null,
};
}
if (recovery) {
return {
documents: [
{
monitorName: '{{context.monitorName}}',
monitorUrl: '{{{context.monitorUrl}}}',
statusMessage: '{{{context.status}}}',
latestErrorMessage: '{{{context.latestErrorMessage}}}',
observerLocation: '{{context.locationName}}',
recoveryReason: '{{context.recoveryReason}}',
},
],
indexOverride: null,
};
}
return {
documents: [
{
monitorName: '{{context.monitorName}}',
monitorUrl: '{{{context.monitorUrl}}}',
statusMessage: '{{{context.statusMessage}}}',
latestErrorMessage: '{{{context.latestErrorMessage}}}',
observerLocation: '{{context.observerLocation}}',
statusMessage: '{{{context.status}}}',
latestErrorMessage: '{{{context.lastErrorMessage}}}',
observerLocation: '{{context.locationName}}',
},
],
indexOverride: null,

View file

@ -87,6 +87,7 @@ export const createAlert = async ({
defaultRecoveryMessage: MonitorStatusTranslations.defaultRecoveryMessage,
defaultSubjectMessage: MonitorStatusTranslations.defaultSubjectMessage,
},
isLegacy: true,
});
const data: NewMonitorStatusAlert = {

View file

@ -4,9 +4,11 @@
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/
import { updateState } from './common';
import { alertsMock } from '@kbn/alerting-plugin/server/mocks';
import { IBasePath } from '@kbn/core/server';
import { updateState, setRecoveredAlertsContext } from './common';
import { SyntheticsCommonState } from '../../common/runtime_types/alert_rules/common';
import { StaleDownConfig } from './status_rule/status_rule_executor';
describe('updateState', () => {
let spy: jest.SpyInstance<string, []>;
@ -180,3 +182,153 @@ describe('updateState', () => {
`);
});
});
describe('setRecoveredAlertsContext', () => {
const { alertFactory } = alertsMock.createRuleExecutorServices();
const { getRecoveredAlerts } = alertFactory.done();
const alertUuid = 'alert-id';
const location = 'US Central';
const configId = '12345';
const idWithLocation = `${configId}-${location}`;
const basePath = {
publicBaseUrl: 'https://localhost:5601',
} as IBasePath;
const getAlertUuid = () => alertUuid;
const upConfigs = {
[idWithLocation]: {
configId,
monitorQueryId: 'stale-config',
status: 'up',
location: '',
ping: {
'@timestamp': new Date().toISOString(),
} as StaleDownConfig['ping'],
timestamp: new Date().toISOString(),
},
};
it('sets context correctly when monitor is deleted', () => {
const setContext = jest.fn();
getRecoveredAlerts.mockReturnValue([
{
getId: () => alertUuid,
getState: () => ({
idWithLocation,
monitorName: 'test-monitor',
}),
setContext,
},
]);
const staleDownConfigs = {
[idWithLocation]: {
configId,
monitorQueryId: 'stale-config',
status: 'down',
location: 'location',
ping: {
'@timestamp': new Date().toISOString(),
} as StaleDownConfig['ping'],
timestamp: new Date().toISOString(),
isDeleted: true,
},
};
setRecoveredAlertsContext({
alertFactory,
basePath,
getAlertUuid,
spaceId: 'default',
staleDownConfigs,
upConfigs: {},
});
expect(setContext).toBeCalledWith({
idWithLocation,
alertDetailsUrl: 'https://localhost:5601/app/observability/alerts/alert-id',
monitorName: 'test-monitor',
recoveryReason: 'Monitor has been deleted',
});
});
it('sets context correctly when location is removed', () => {
const setContext = jest.fn();
getRecoveredAlerts.mockReturnValue([
{
getId: () => alertUuid,
getState: () => ({
idWithLocation,
monitorName: 'test-monitor',
}),
setContext,
},
]);
const staleDownConfigs = {
[idWithLocation]: {
configId,
monitorQueryId: 'stale-config',
status: 'down',
location: 'location',
ping: {
'@timestamp': new Date().toISOString(),
} as StaleDownConfig['ping'],
timestamp: new Date().toISOString(),
isLocationRemoved: true,
},
};
setRecoveredAlertsContext({
alertFactory,
basePath,
getAlertUuid,
spaceId: 'default',
staleDownConfigs,
upConfigs: {},
});
expect(setContext).toBeCalledWith({
idWithLocation,
alertDetailsUrl: 'https://localhost:5601/app/observability/alerts/alert-id',
monitorName: 'test-monitor',
recoveryReason: 'Location has been removed from the monitor',
});
});
it('sets context correctly when monitor is up', () => {
const setContext = jest.fn();
getRecoveredAlerts.mockReturnValue([
{
getId: () => alertUuid,
getState: () => ({
idWithLocation,
monitorName: 'test-monitor',
}),
setContext,
},
]);
const staleDownConfigs = {
[idWithLocation]: {
configId,
monitorQueryId: 'stale-config',
status: 'down',
location: 'location',
ping: {
'@timestamp': new Date().toISOString(),
} as StaleDownConfig['ping'],
timestamp: new Date().toISOString(),
isLocationRemoved: true,
},
};
setRecoveredAlertsContext({
alertFactory,
basePath,
getAlertUuid,
spaceId: 'default',
staleDownConfigs,
upConfigs,
});
expect(setContext).toBeCalledWith({
idWithLocation,
alertDetailsUrl: 'https://localhost:5601/app/observability/alerts/alert-id',
monitorName: 'test-monitor',
status: 'up',
recoveryReason: 'Monitor has recovered with status Up',
});
});
});

View file

@ -80,12 +80,14 @@ export const setRecoveredAlertsContext = ({
getAlertUuid,
spaceId,
staleDownConfigs,
upConfigs,
}: {
alertFactory: RuleExecutorServices['alertFactory'];
basePath?: IBasePath;
getAlertUuid?: (alertId: string) => string | null;
spaceId?: string;
staleDownConfigs: AlertOverviewStatus['staleDownConfigs'];
upConfigs: AlertOverviewStatus['upConfigs'];
}) => {
const { getRecoveredAlerts } = alertFactory.done();
for (const alert of getRecoveredAlerts()) {
@ -95,6 +97,7 @@ export const setRecoveredAlertsContext = ({
const state = alert.getState() as SyntheticsCommonState;
let recoveryReason = '';
let isUp = false;
if (state?.idWithLocation && staleDownConfigs[state.idWithLocation]) {
const { idWithLocation } = state;
@ -110,8 +113,16 @@ export const setRecoveredAlertsContext = ({
}
}
if (state?.idWithLocation && upConfigs[state.idWithLocation]) {
isUp = Boolean(upConfigs[state.idWithLocation]) || false;
recoveryReason = i18n.translate('xpack.synthetics.alerts.monitorStatus.upCheck', {
defaultMessage: `Monitor has recovered with status Up`,
});
}
alert.setContext({
...state,
...(isUp ? { status: 'up' } : {}),
...(recoveryReason ? { [RECOVERY_REASON]: recoveryReason } : {}),
...(basePath && spaceId && alertUuid
? { [ALERT_DETAILS_URL]: getAlertDetailsUrl(basePath, spaceId, alertUuid) }

View file

@ -84,7 +84,7 @@ export const registerSyntheticsStatusCheckRule = (
syntheticsMonitorClient
);
const { downConfigs, staleDownConfigs } = await statusRule.getDownChecks(
const { downConfigs, staleDownConfigs, upConfigs } = await statusRule.getDownChecks(
ruleState.meta?.downConfigs as OverviewStatus['downConfigs']
);
@ -129,6 +129,7 @@ export const registerSyntheticsStatusCheckRule = (
getAlertUuid,
spaceId,
staleDownConfigs,
upConfigs,
});
return {