kibana

mirror of https://github.com/elastic/kibana.git synced 2025-06-29 03:24:45 -04:00

Author	SHA1	Message	Date
Carlos Crespo	860f8dbf13	[Serverless][Observability] Use roles-based testing - api_integration (#184654 ) part of: [#184033](https://github.com/elastic/kibana/issues/184033) ## Summary This PR changes the observability serverless tests (and impacted security tests) to not run with operator privileges. ### How to test - Follow the steps from https://github.com/elastic/kibana/blob/main/x-pack/test_serverless/README.md#run-tests-on-mki	2024-06-07 08:34:28 -07:00
Kevin Delemme	36008b09eb	chore(slo): Add tests for historical summary api (#183648 )	2024-05-17 09:32:04 -04:00
Chris Cowan	96dc2a5104	[SLO] Add dependencies for Burn Rate rule suppression (#177078 ) ## 🍒 Summary This PR adds a rule dependency feature to the SLO Burn Rate rule to enable rule suppression when one of the dependencies meets the suppression criteria. ### 📟 Use case When you add a rule dependency to your SLO Burn Rate rule, you will also choose which action groups you want to suppress on. For example, if you have rule `A` which depends on rule `B` and you want to suppress the actions of rule `A` when rule `B` is triggering `Critical` or `High`, you'd add rule `B` and pick the action groups `Critical` and `High`. When rule `B` is triggering either of those action groups, ALL of the actions for rule `A` will be suppressed. When an action is suppressed, we will trigger a `Suppressed` action group an set the context variable `{context.suppressedAction}` to the action that would have been trigger if they rule wasn't suppressed. This will allow users to create an "action" for `Suppressed` alerts so they can still create notification without waking up the team for a `Critical` or `High` severity alert. If you have 2 rules that use a group by, then the suppression will happen on the intersection of the `slo.instanceId`. For example, imagine we have a Nginx Proxy in front of an Node.js web service and we've created an availability SLO based on `status_code < 500` for both, grouped-by `url.domain`. When the Node.js app responds with a `500`, the Nginx Proxy's SLO will start to degrade because of the Node.js service. The admins for the Nginx Proxy would like to only receive alerts if the Node.js web services is "healthy" so they've listed the Node.js burn rate rule as a dependency to suppress on `Critical` or `High` burn rates. When one of the domains, `you-got.mail`, starts to throw 500's and the burn rate becomes `High`, the rule will suppress the alert for the `you-got.mail` Nginx Proxy instance. If one of the other domains, `box.mail`, for Nginx started throwing `502` because of a mis-configuration, the alert would trigger normally because the `box.mail` instance of the rule dependency for the Node.js web service is still healthy (or not triggering `Critical` or `High`). The suppression between group-by SLOs and non-group-by SLOs works like this: - SLO with a group-by depends on a non-grouped-by SLO, all the instances of the group by will be suppressed. - SLO without a group-by depends on an SLO with a group-by, the non-grouped SLO will be suppressed if ANY of the instances of the group-by are triggering the "suppress on" action groups. ### 💻 Screenshots Adding a rule dependency for MongoDB to a Node.js web app <img width="764" alt="image" src="`da2fd411`-2a8e-4433-a505-2c4111e115be"> In this scenario, Nginx Proxies to Admin Console which reads data from MongoDB. The connection between MongoDB and the Admin Console has a network outage which causes the MongoDB rule to trigger a `Critical` action group and suppresses the `Critical` action for the Admin Console. The Admin Console also goes `Critical` which then suppresses the rule for the Nginx Proxy. <img width="1784" alt="image" src="`2db75993`-8912-4769-83f8-240de811a92f"> ### ⚙️ How it works - Execute the primary rule and evaluate if should trigger any actions - If the primary rule is triggering, execute each of the dependencies (in the same process using the same function) and suppress when: - For group-by SLOs that depend on another SLO with a group by, we suppress the intersection between the instanceIds. - For group-by SLOs that depend on a non-group-by SLO, we suppress all the instanceIds. - For non-group-by SLO that depends on a group-by SLO, we suppress if ANY instanceId matches. (not recommended) ### 🔬 How to test - Add the following lines to your `config/kibana.dev.yaml`: - `server.basePath: '/kibana'` - `server.publicBaseUrl: 'http://localhost:5601/kibana'` - Start with the following command: `node x-pack/scripts/data_forge.js --events-per-cycle 50 --lookback now-1d --dataset fake_stack --install-kibana-assets --kibana-url http://localhost:5601/kibana --event-template good` - Wait till the log message says `info Waiting 60000ms` - Create 2 SLOs: - "Admin Console Availability" using the "Custom Query" SLI with the `Admin Console` DataView, set the "Good query" to `http.response.status_code < 500` and the set the "Total query" to `http.response.status_code: ` using a rolling `7d` time window - "MongoDB Availability" using the "Custom Query" SLI with the `Heartbeat` DataView, set the "Good query" to `event.outcome: "success"` and the set the "Total query" to `event.outcome: ` using a rolling `7d` time window - You should have 2 burn rate rules that were created by default - Open the "Admin Console Availability Burn Rate rule" and add the "MongoDB Availability Burn Rate rule" as the dependency with `Critical` and `High` action groups to "Suppress on". - Save the rule - Stop the first `data_forge.js` command - Start `node x-pack/scripts/data_forge.js --events-per-cycle 50 --lookback now --dataset fake_stack --install-kibana-assets --kibana-url http://localhost:5601/kibana --event-template bad` Once the Burn Rate rules go `Critical`, you should see the "MongoDB Availability Burn Rate rule" reason message should start with `CRITICAL:...` and the "Admin Console Availability Burn Rate rule" reason message should start with `SUPPRESSED - CRITICAL: ...` Fixes #173653 --------- Co-authored-by: Panagiota Mitsopoulou <giota85@gmail.com> Co-authored-by: Dominique Clarke <doclarke71@gmail.com> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co>	2024-04-16 06:25:50 -04:00
Panagiota Mitsopoulou	33ca9ece68	[Slo] serverless integration tests (#172786 ) ## 🍒 Summary This PR adds basic serverless integration tests for SLO and covers 2 scenarios: - SLO creation - SLO deletion There is another PR that adds SLO integration tests for stateful and covers more scenarios: - create - delete - update - reset - get/find Current PR will cover only the create and delete scenarios. I want to check the flakiness of these tests, before introducing new ones. Another reason for not covering all scenarios in this PR is because we don't want to have duplicate effort with @dominiqueclarke's [PR](https://github.com/elastic/kibana/pull/173236). Once stateful tests are reviewed and merged, we can come up with a plan on how/if to continue with serverless tests and more scenarios. TODO: Create a github issue to track the SLO integration tests effort. Also we need to add privilege & space related test cases --------- Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>	2024-01-16 06:56:53 -07:00
Xavier Mouligneau	a35f91e3a5	[RAM] add observability feature for server less (#168636 ) ## Summary FIX => https://github.com/elastic/kibana/issues/168034 ### Checklist - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: mgiota <panagiota.mitsopoulou@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>	2023-10-31 14:27:53 -07:00

5 commits