# Backport
This will backport the following commits from `main` to `8.8`:
- [[TaskManager] log health on interval with background_tasks only role
(#158890)](https://github.com/elastic/kibana/pull/158890)
<!--- Backport version: 8.9.7 -->
### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)
<!--BACKPORT [{"author":{"name":"Patrick
Mueller","email":"patrick.mueller@elastic.co"},"sourceCommit":{"committedDate":"2023-06-06T12:42:40Z","message":"[TaskManager]
log health on interval with background_tasks only role
(#158890)\n\nresolves
https://github.com/elastic/kibana/issues/158870\r\n\r\n##
Summary\r\n\r\nFor Kibana servers that only have node role
`background_tasks`, log the\r\ntask manager health report to the Kibana
logs on an interval, currently\r\nevery hour.\r\n\r\nCo-authored-by:
Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"837ef26fb0cced40214b25f0f1f22a8a0d610fb2","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["enhancement","release_note:skip","Feature:Task
Manager","Team:ResponseOps","backport:prev-minor","v8.9.0","v8.8.2"],"number":158890,"url":"https://github.com/elastic/kibana/pull/158890","mergeCommit":{"message":"[TaskManager]
log health on interval with background_tasks only role
(#158890)\n\nresolves
https://github.com/elastic/kibana/issues/158870\r\n\r\n##
Summary\r\n\r\nFor Kibana servers that only have node role
`background_tasks`, log the\r\ntask manager health report to the Kibana
logs on an interval, currently\r\nevery hour.\r\n\r\nCo-authored-by:
Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"837ef26fb0cced40214b25f0f1f22a8a0d610fb2"}},"sourceBranch":"main","suggestedTargetBranches":["8.8"],"targetPullRequestStates":[{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/158890","number":158890,"mergeCommit":{"message":"[TaskManager]
log health on interval with background_tasks only role
(#158890)\n\nresolves
https://github.com/elastic/kibana/issues/158870\r\n\r\n##
Summary\r\n\r\nFor Kibana servers that only have node role
`background_tasks`, log the\r\ntask manager health report to the Kibana
logs on an interval, currently\r\nevery hour.\r\n\r\nCo-authored-by:
Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"837ef26fb0cced40214b25f0f1f22a8a0d610fb2"}},{"branch":"8.8","label":"v8.8.2","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->
# Backport
This will backport the following commits from `main` to `8.8`:
- [[ResponseOps][Task Manager] stop spamming the logs on status changes
(#157762)](https://github.com/elastic/kibana/pull/157762)
<!--- Backport version: 8.9.7 -->
### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)
<!--BACKPORT [{"author":{"name":"Patrick
Mueller","email":"patrick.mueller@elastic.co"},"sourceCommit":{"committedDate":"2023-05-15T20:21:16Z","message":"[ResponseOps][Task
Manager] stop spamming the logs on status changes (#157762)\n\nresolves
https://github.com/elastic/kibana/issues/156112\r\n\r\nChange task
manager logging on status errors from `warn` to `debug.\r\nMaking this
change as we recently changed from `debug` to `warn`
in\r\nhttps://github.com/elastic/kibana/pull/154045 . But this ended up
too\r\nnoisy, especially at Kibana
startup.","sha":"b542862904073982a00ecd7418cc77dbe567b2d0","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Task
Manager","Team:ResponseOps","v8.8.0","v8.9.0"],"number":157762,"url":"https://github.com/elastic/kibana/pull/157762","mergeCommit":{"message":"[ResponseOps][Task
Manager] stop spamming the logs on status changes (#157762)\n\nresolves
https://github.com/elastic/kibana/issues/156112\r\n\r\nChange task
manager logging on status errors from `warn` to `debug.\r\nMaking this
change as we recently changed from `debug` to `warn`
in\r\nhttps://github.com/elastic/kibana/pull/154045 . But this ended up
too\r\nnoisy, especially at Kibana
startup.","sha":"b542862904073982a00ecd7418cc77dbe567b2d0"}},"sourceBranch":"main","suggestedTargetBranches":["8.8"],"targetPullRequestStates":[{"branch":"8.8","label":"v8.8.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/157762","number":157762,"mergeCommit":{"message":"[ResponseOps][Task
Manager] stop spamming the logs on status changes (#157762)\n\nresolves
https://github.com/elastic/kibana/issues/156112\r\n\r\nChange task
manager logging on status errors from `warn` to `debug.\r\nMaking this
change as we recently changed from `debug` to `warn`
in\r\nhttps://github.com/elastic/kibana/pull/154045 . But this ended up
too\r\nnoisy, especially at Kibana
startup.","sha":"b542862904073982a00ecd7418cc77dbe567b2d0"}}]}]
BACKPORT-->
Co-authored-by: Patrick Mueller <patrick.mueller@elastic.co>
## Description
Fix https://github.com/elastic/kibana/issues/104081
This PR move some of the SO types from the `.kibana` index into the
following ones:
- `.kibana_alerting_cases`
- `.kibana_analytics`
- `.kibana_security_solution`
- `.kibana_ingest`
This split/reallocation will occur during the `8.8.0` Kibana upgrade
(*meaning: from any version older than `8.8.0` to any version greater or
equal to `8.8.0`*)
**This PR main changes are:**
- implement the changes required in the SO migration algorithm to
support this reallocation
- update the FTR tools (looking at you esArchiver) to support these new
indices
- update hardcoded references to `.kibana` and usage of the
`core.savedObjects.getKibanaIndex()` to use new APIs to target the
correct index/indices
- update FTR datasets, tests and utility accordingly
## To reviewers
**Overall estimated risk of regressions: low**
But, still, please take the time to review changes in your code. The
parts of the production code that were the most impacted are the
telemetry collectors, as most of them were performing direct requests
against the `.kibana` index, so we had to adapt them. Most other
contributor-owned changes are in FTR tests and datasets.
If you think a type is misplaced (either we missed some types that
should be moved to a specific index, or some types were moved and
shouldn't have been) please tell us, and we'll fix the reallocation
either in this PR or in a follow-up.
## .Kibana split
The following new indices are introduced by this PR, with the following
SO types being moved to it. (any SO type not listed here will be staying
in its current index)
Note: The complete **_type => index_** breakdown is available in [this
spreadsheet](https://docs.google.com/spreadsheets/d/1b_MG_E_aBksZ4Vkd9cVayij1oBpdhvH4XC8NVlChiio/edit#gid=145920788).
#### `.kibana_alerting_cases`
- action
- action_task_params
- alert
- api_key_pending_invalidation
- cases
- cases-comments
- cases-configure
- cases-connector-mappings
- cases-telemetry
- cases-user-actions
- connector_token
- rules-settings
- maintenance-window
#### `.kibana_security_solution`
- csp-rule-template
- endpoint:user-artifact
- endpoint:user-artifact-manifest
- exception-list
- exception-list-agnostic
- osquery-manager-usage-metric
- osquery-pack
- osquery-pack-asset
- osquery-saved-query
- security-rule
- security-solution-signals-migration
- siem-detection-engine-rule-actions
- siem-ui-timeline
- siem-ui-timeline-note
- siem-ui-timeline-pinned-event
#### `.kibana_analytics`
- canvas-element
- canvas-workpad-template
- canvas-workpad
- dashboard
- graph-workspace
- index-pattern
- kql-telemetry
- lens
- lens-ui-telemetry
- map
- search
- search-session
- search-telemetry
- visualization
#### `.kibana_ingest`
- epm-packages
- epm-packages-assets
- fleet-fleet-server-host
- fleet-message-signing-keys
- fleet-preconfiguration-deletion-record
- fleet-proxy
- ingest_manager_settings
- ingest-agent-policies
- ingest-download-sources
- ingest-outputs
- ingest-package-policies
## Tasks / PRs
### Sub-PRs
**Implementation**
- 🟣https://github.com/elastic/kibana/pull/154846
- 🟣https://github.com/elastic/kibana/pull/154892
- 🟣https://github.com/elastic/kibana/pull/154882
- 🟣https://github.com/elastic/kibana/pull/154884
- 🟣https://github.com/elastic/kibana/pull/155155
**Individual index split**
- 🟣https://github.com/elastic/kibana/pull/154897
- 🟣https://github.com/elastic/kibana/pull/155129
- 🟣https://github.com/elastic/kibana/pull/155140
- 🟣https://github.com/elastic/kibana/pull/155130
### Improvements / follow-ups
- 👷🏼 Extract logic into
[runV2Migration](https://github.com/elastic/kibana/pull/154151#discussion_r1158470566)
@gsoldevila
- Make `getCurrentIndexTypesMap` resillient to intermittent failures
https://github.com/elastic/kibana/pull/154151#discussion_r1169289717
- 🚧 Build a more structured
[MigratorSynchronizer](https://github.com/elastic/kibana/pull/154151#discussion_r1158469918)
- 🟣https://github.com/elastic/kibana/pull/155035
- 🟣https://github.com/elastic/kibana/pull/155116
- 🟣https://github.com/elastic/kibana/pull/155366
## Reallocation tweaks
Tweaks to the reallocation can be done after the initial merge, as long
as it's done before the public release of 8.8
- `url` should get back to `.kibana` (see
[comment](https://github.com/elastic/kibana/pull/154888#discussion_r1172317133))
## Release Note
For performance purposes, Kibana is now using more system indices to
store its internal data.
The following system indices will be created when upgrading to `8.8.0`:
- `.kibana_alerting_cases`
- `.kibana_analytics`
- `.kibana_security_solution`
- `.kibana_ingest`
---------
Co-authored-by: pgayvallet <pierre.gayvallet@elastic.co>
Co-authored-by: Christos Nasikas <christos.nasikas@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Georgii Gorbachev <georgii.gorbachev@elastic.co>
Resolves https://github.com/elastic/kibana/issues/151457.
In this PR, I'm deprecating ephemeral tasks and their related settings.
The following settings have been deprecated with proper warning
messages:
- `xpack.task_manager.ephemeral_tasks.enabled`
- `xpack.task_manager.ephemeral_tasks.request_capacity`
- `xpack.alerting.maxEphemeralActionsPerAlert`
## To verify
1. Set the following in your `kibana.yml`
```
xpack.task_manager.ephemeral_tasks.enabled: true
xpack.task_manager.ephemeral_tasks.request_capacity: 10
xpack.alerting.maxEphemeralActionsPerAlert: 10
```
2. Start up Kibana
3. Notice the deprecation warnings about these settings appear in the
logs
4. Remove settings from step 1
## Sample warning logs
```
[2023-04-18T09:45:36.731-04:00][WARN ][config.deprecation] Configuring "xpack.alerting.maxEphemeralActionsPerAlert" is deprecated and will be removed in a future version. Remove this setting to increase action execution resiliency.
[2023-04-18T09:45:36.732-04:00][WARN ][config.deprecation] Configuring "xpack.task_manager.ephemeral_tasks.enabled" is deprecated and will be removed in a future version. Remove this setting to increase task execution resiliency.
[2023-04-18T09:45:36.732-04:00][WARN ][config.deprecation] Configuring "xpack.task_manager.ephemeral_tasks.request_capacity" is deprecated and will be removed in a future version. Remove this setting to increase task execution resiliency.
```
### Release notes
The following settings have been deprecated. Remove them to increase
task execution resiliency.
- `xpack.task_manager.ephemeral_tasks.enabled`
- `xpack.task_manager.ephemeral_tasks.request_capacity`
- `xpack.alerting.maxEphemeralActionsPerAlert`
---------
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: lcawl <lcawley@elastic.co>
## Summary
This PR is just the first phase for response ops to go through their
saved object attributes. The idea is to comment out all the attributes
that we all agree that we do not need to filter/search/sort/aggregate
on.
After, in a second phase/PR, we will create a new file who will
represent all of attributes in our saved object as a source of truth.
Then, we will generate our SO mappings from this source of truth to
register our saved object.
Phase 3, we will try to generate also our type from our source of truth.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Resolves: #152289
With this PR we make some of the `debug` logs `warn` and return the
message to the health API as reason to add in the status summary
message.
resolves https://github.com/elastic/kibana/issues/142874
The alerting framework now generates an alert UUID for every alert it
creates. The UUID will be reused for alerts which continue to be active
on subsequent runs, until the alert recovers. When the same alert (alert
instance id) becomes active again, a new UUID will be generated. These
UUIDs then identify a "span" of events for a single alert.
The rule registry plugin was already adding these UUIDs to it's own
alerts-as-data indices, and that code has now been changed to make use
of the new UUID the alerting framework generates.
- adds property in the rule task state
`alertInstances[alertInstanceId].meta.uuid`; this is where the alert
UUID is persisted across runs
- adds a new `Alert` method getUuid(): string` that can be used by rule
executors to obtain the UUID of the alert they just retrieved from the
factory; the rule registry uses this to get the UUID generated by the
alerting framework
- for the event log, adds the property `kibana.alert.uuid` to
`*-instance` event log events; this is the same field the rule registry
writes into the alerts-as-data indices
- various changes to tests to accommodate new UUID data / methods
- migrates the UUID previous stored with lifecycle alerts in the alert
state, via the rule registry *INTO* the new `meta.uuid` field in the
existing alert state.
This PR is a preparation to
https://github.com/elastic/kibana/issues/151938 to avoid having a large
PR to review all at once. Here, I'm removing support for on-demand
polling (`pollRequests$`). This code path was used when we supported
`runNow`, and is no longer used after transitioning to `runSoon`. There
is a lot of underlying code related to this within the `task_poller.ts`
file to support that capability, including a queue of events
(combination of `pushOptionalIntoSet` and `pullFromSet`) before calling
`concatMap` that is no longer needed.
## To verify
Use the following to log timestamps whenever task manager is polling for
work
```
--- a/x-pack/plugins/task_manager/server/polling_lifecycle.ts
+++ b/x-pack/plugins/task_manager/server/polling_lifecycle.ts
@@ -243,6 +243,7 @@ export class TaskPollingLifecycle {
}
private pollForWork = async (): Promise<TimedFillPoolResult> => {
+ console.log('*** pollForWork', new Date().toISOString());
return fillPool(
// claim available tasks
() => {
```
1. Ensure when Kibana is running, the pollForWork is logged on 3s
intervals
2. Modify the code to slow down the updateByQuery to 5s, notice the next
pollForWork isn't called until two update by queries have ran (main pool
and reporting). Should be every 10s instead of 3s.
```
--- a/x-pack/plugins/task_manager/server/task_store.ts
+++ b/x-pack/plugins/task_manager/server/task_store.ts
@@ -444,6 +444,7 @@ export class TaskStore {
// eslint-disable-next-line @typescript-eslint/naming-convention
{ max_docs: max_docs }: UpdateByQueryOpts = {}
): Promise<UpdateByQueryResult> {
+ await new Promise((resolve) => setTimeout(resolve, 5000));
const { query } = ensureQueryOnlyReturnsTaskObjects(opts);
try {
const // eslint-disable-next-line @typescript-eslint/naming-convention
```
3. Slow down the `setTimeout` added in step two by setting it to `45000`
and observe the pollForWork getting called on 30s interval with some
timeout errors getting logged (same as main)
4. Undo the code changes from step two in favour of this one. Observe
the pollForWork getting called on 3s intervals even though an error is
thrown
```
--- a/x-pack/plugins/task_manager/server/task_store.ts
+++ b/x-pack/plugins/task_manager/server/task_store.ts
@@ -444,6 +444,7 @@ export class TaskStore {
// eslint-disable-next-line @typescript-eslint/naming-convention
{ max_docs: max_docs }: UpdateByQueryOpts = {}
): Promise<UpdateByQueryResult> {
+ throw new Error('oh no!');
const { query } = ensureQueryOnlyReturnsTaskObjects(opts);
try {
const // eslint-disable-next-line @typescript-eslint/naming-convention
```
---------
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Part of https://github.com/elastic/kibana/issues/79977 (step 1 and 3).
In this PR, I'm making Task Manager remove tasks instead of updating
them with `status: failed` whenever a task is out of attempts. I've also
added an optional `cleanup` hook to the task runner that can be defined
if additional cleanup is necessary whenever a task has been deleted (ex:
delete `action_task_params`).
## To verify an ad-hoc task that always fails
1. With this PR codebase, modify an action to always throw an error
2. Create an alerting rule that will invoke the action once
3. See the action fail three times
4. Observe the task SO is deleted (search by task type / action type)
alongside the action_task_params SO
## To verify Kibana crashing on the last ad-hoc task attempt
1. With this PR codebase, modify an action to always throw an error
(similar to scenario above) but also add a delay of 10s before the error
is thrown (`await new Promise((resolve) => setTimeout(resolve, 10000));`
and a log message before the delay begins
2. Create an alerting rule that will invoke the action once
3. See the action fail twice
4. On the third run, crash Kibana while the action is waiting for the
10s delay, this will cause the action to still be marked as running
while it no longer is
5. Restart Kibana
6. Wait 5-10m until the task's retryAt is overdue
7. Observe the task getting deleted and the action_task_params getting
deleted
## To verify recurring tasks that continuously fail
1. With this PR codebase, modify a rule type to always throw an error
when it runs
2. Create an alerting rule of that type (with a short interval)
3. Observe the rule continuously running and not getting trapped into
the PR changes
Flaky test runner:
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2036
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
fixes: #140867
Our task manager collects all the due tasks, executes their rules and
actions and updates the tasks at the end.
As executing rule and actions may take long time and not fetching the
tasks again before saving, updating a rule and task sometimes creates
209 conflict error.
To avoid this, i disabled the rule before updating it and waited for 3
sec to be sure that the ongoing task execution cycle is done. Then i
enabled the task back again expected it to be running successfully
flaky test runner (There is one failure but it's irrelevant):
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2008#_
Fixes https://github.com/elastic/kibana/issues/140973
Fixes https://github.com/elastic/kibana-team/issues/563
In this PR, I'm fixing flaky tests that caused extra telemetry runs
whenever CI would run them near midnight UTC. The assertion expected two
runs while sometimes a 3rd run would happen if the test ran near
midnight when the telemetry task was scheduled to run again..
To fix this, I've moved away from the midnight scheduling given
telemetry only needs to be reported daily, and moved the task to use a
`schedule` within task manager to make the task run daily (+24hrs from
the previous run). This also improves error handling given task manager
will now know it's a recurring task and recurring tasks never get marked
as `failed`.
The following verification steps can be done using this query in Dev
Tools
```
GET .kibana_task_manager/_search
{
"query": {
"term": {
"task.taskType": "actions_telemetry"
}
}
}
```
## To verify existing tasks migrating to a schedule
1. Using `main`, setup a fresh Kibana and ES instance
2. Keep Elasticsearch running but shut down Kibana after setup is
complete
3. Switch from `main` to this PR
4. Add `await taskManager.runSoon(TASK_ID);` after the `ensureScheduled`
call within `x-pack/plugins/actions/server/usage/task.ts`.
5. Startup Kibana
6. Go in Dev Tools and pull the task information to see a new `schedule`
attribute added
## To verify fresh installs
1. Using this PR code, setup a fresh Kibana and ES instance
2. Go in Dev Tools and pull the task information to see a new `schedule`
attribute added
Flaky test runner:
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2017
---------
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Similar to https://github.com/elastic/kibana/pull/144910
In this PR, I'm removing the `maxConcurrency` from the
`apm-source-map-migration-task` task types given it only has a single
task created for it. The concurrency setting limits how many tasks of
such type a single Kibana process should handle at most, and internally
requires a separate task claiming query to run every poll interval to
claim those tasks.
With this PR, task manager goes from running 3 update_by_query requests
to 2 every 3 seconds, removing stress put onto Elasticsearch.
For more details, see `maxConcurrency` here
https://github.com/elastic/kibana/tree/main/x-pack/plugins/task_manager#task-definitions.
I've also created an allow list of which task types can set a
`maxConcurrency` given the consequences this has on system performance.
---------
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Fixes https://github.com/elastic/kibana/issues/149344
This PR migrates all plugins to packages automatically. It does this
using `node scripts/lint_packages` to automatically migrate
`kibana.json` files to `kibana.jsonc` files. By doing this automatically
we can simplify many build and testing procedures to only support
packages, and not both "packages" and "synthetic packages" (basically
pointers to plugins).
The majority of changes are in operations related code, so we'll be
having operations review this before marking it ready for review. The
vast majority of the code owners are simply pinged because we deleted
all `kibana.json` files and replaced them with `kibana.jsonc` files, so
we plan on leaving the PR ready-for-review for about 24 hours before
merging (after feature freeze), assuming we don't have any blockers
(especially from @elastic/kibana-core since there are a few core
specific changes, though the majority were handled in #149370).
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Resolves https://github.com/elastic/kibana/issues/148914
Resolves https://github.com/elastic/kibana/issues/149090
Resolves https://github.com/elastic/kibana/issues/149091
Resolves https://github.com/elastic/kibana/issues/149092
In this PR, I'm making the following Task Manager bulk APIs retry
whenever conflicts are encountered: `bulkEnable`, `bulkDisable`, and
`bulkUpdateSchedules`.
To accomplish this, the following had to be done:
- Revert the original PR (https://github.com/elastic/kibana/pull/147808)
because the retries didn't load the updated documents whenever version
conflicts were encountered and the approached had to be redesigned.
- Create a `retryableBulkUpdate` function that can be re-used among the
bulk APIs.
- Fix a bug in `task_store.ts` where `version` field wasn't passed
through properly (no type safety for some reason)
- Remove `entity` from being returned on bulk update errors. This helped
re-use the same response structure when objects weren't found
- Create a `bulkGet` API on the task store so we get the latest
documents prior to a ES refresh happening
- Create a single mock task function that mocks task manager tasks for
unit test purposes. This was necessary as other places were doing `as
unknown as BulkUpdateTaskResult` and escaping type safety
Flaky test runs:
- [Framework]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/1776
- [Kibana Security]
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/1786
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
This PR upgrades uuid into its latest version `9.0.0`.
The previous default used version `v4` was kept where it was previously
used and places using `v1` or `v5` are still using it.
In this latest version they removed the deep import feature and as we
are not using tree shaking it increased our bundles by a significant
size. As such, I've moved this dependency into the `ui-shared-deps-npm`
bundle.
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Resolves: https://github.com/elastic/kibana/issues/109937
In this PR, I'm fixing the `Task Manager detected a degradation in
performance` log message to provide the proper versioned link to the
task manager health monitoring docs. Prior to this PR, it would always
point to main / master docs.
## To verify
1. Turn on debug logging in your kibana.yml file
```
logging:
loggers:
- name: plugins.taskManager
level: debug
```
2. Move this code line outside of the `if (logLevel..` statement =>
4c7ce9d249/x-pack/plugins/task_manager/server/lib/log_health_metrics.ts (L99)
3. Startup Kibana
4. Notice the `Task Manager detected a degradation in performance...`
logged
5. Test the URL provided by the log message
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
## Dearest Reviewers 👋
I've been working on this branch with @mistic and @tylersmalley and
we're really confident in these changes. Additionally, this changes code
in nearly every package in the repo so we don't plan to wait for reviews
to get in before merging this. If you'd like to have a concern
addressed, please feel free to leave a review, but assuming that nobody
raises a blocker in the next 24 hours we plan to merge this EOD pacific
tomorrow, 12/22.
We'll be paying close attention to any issues this causes after merging
and work on getting those fixed ASAP. 🚀
---
The operations team is not confident that we'll have the time to achieve
what we originally set out to accomplish by moving to Bazel with the
time and resources we have available. We have also bought ourselves some
headroom with improvements to babel-register, optimizer caching, and
typescript project structure.
In order to make sure we deliver packages as quickly as possible (many
teams really want them), with a usable and familiar developer
experience, this PR removes Bazel for building packages in favor of
using the same JIT transpilation we use for plugins.
Additionally, packages now use `kbn_references` (again, just copying the
dx from plugins to packages).
Because of the complex relationships between packages/plugins and in
order to prepare ourselves for automatic dependency detection tools we
plan to use in the future, this PR also introduces a "TS Project Linter"
which will validate that every tsconfig.json file meets a few
requirements:
1. the chain of base config files extended by each config includes
`tsconfig.base.json` and not `tsconfig.json`
1. the `include` config is used, and not `files`
2. the `exclude` config includes `target/**/*`
3. the `outDir` compiler option is specified as `target/types`
1. none of these compiler options are specified: `declaration`,
`declarationMap`, `emitDeclarationOnly`, `skipLibCheck`, `target`,
`paths`
4. all references to other packages/plugins use their pkg id, ie:
```js
// valid
{
"kbn_references": ["@kbn/core"]
}
// not valid
{
"kbn_references": [{ "path": "../../../src/core/tsconfig.json" }]
}
```
5. only packages/plugins which are imported somewhere in the ts code are
listed in `kbn_references`
This linter is not only validating all of the tsconfig.json files, but
it also will fix these config files to deal with just about any
violation that can be produced. Just run `node scripts/ts_project_linter
--fix` locally to apply these fixes, or let CI take care of
automatically fixing things and pushing the changes to your PR.
> **Example:** [`64e93e5`
(#146212)](64e93e5806)
When I merged main into my PR it included a change which removed the
`@kbn/core-injected-metadata-browser` package. After resolving the
conflicts I missed a few tsconfig files which included references to the
now removed package. The TS Project Linter identified that these
references were removed from the code and pushed a change to the PR to
remove them from the tsconfig.json files.
## No bazel? Does that mean no packages??
Nope! We're still doing packages but we're pretty sure now that we won't
be using Bazel to accomplish the 'distributed caching' and 'change-based
tasks' portions of the packages project.
This PR actually makes packages much easier to work with and will be
followed up with the bundling benefits described by the original
packages RFC. Then we'll work on documentation and advocacy for using
packages for any and all new code.
We're pretty confident that implementing distributed caching and
change-based tasks will be necessary in the future, but because of
recent improvements in the repo we think we can live without them for
**at least** a year.
## Wait, there are still BUILD.bazel files in the repo
Yes, there are still three webpack bundles which are built by Bazel: the
`@kbn/ui-shared-deps-npm` DLL, `@kbn/ui-shared-deps-src` externals, and
the `@kbn/monaco` workers. These three webpack bundles are still created
during bootstrap and remotely cached using bazel. The next phase of this
project is to figure out how to get the package bundling features
described in the RFC with the current optimizer, and we expect these
bundles to go away then. Until then any package that is used in those
three bundles still needs to have a BUILD.bazel file so that they can be
referenced by the remaining webpack builds.
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Resolves https://github.com/elastic/kibana/issues/145289
## Summary
This pr removed a field and fixes a bug with the previous metrics and
keeps a running count for the life of the kibana.
### Checklist
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
### To verify
- Create a rule and hit the
`/internal/task_manager/_background_task_utilization` api
- Verify that the counts are increasing
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
https://jestjs.io/blog/2022/04/25/jest-28https://jestjs.io/blog/2022/08/25/jest-29
- jest.useFakeTimers('legacy') -> jest.useFakeTimers({ legacyFakeTimers:
true });
- jest.useFakeTimers('modern'); -> jest.useFakeTimers();
- tests can either use promises or callbacks, but not both
- test runner jasmine is no longer included, switch all suites to
jest-circus
Co-authored-by: Andrew Tate <andrew.tate@elastic.co>
In this PR, I'm adding a new setting
(`xpack.task_manager.monitored_stats_health_verbose_log.level`) that
allows the task manager monitoring stats to be verbosely logged at info
level instead of warning.
The two supported values are:
- debug (default)
- info
This will help debug SDHs on Cloud where we won't want to turn on debug
level on the entire cluster but would still like to see the task manager
monitored stats over time.
## Cloud allow-list PR
https://github.com/elastic/cloud/pull/109563
## To verify
1. Set the following two configuration options:
```
xpack.task_manager.monitored_stats_health_verbose_log.enabled: true
xpack.task_manager.monitored_stats_health_verbose_log.level: info
```
2. Startup Kibana
3. Notice `Latest Monitored Stats:` are logged at info level
4. Remove `xpack.task_manager.monitored_stats_health_verbose_log.level`
configuration
5. Add the following configuration
```
logging:
loggers:
- name: plugins.taskManager
level: debug
```
6. Restart Kibana
7. Notice `Latest Monitored Stats:` are logged at debug level (as usual)
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
In this PR, I'm cleaning up code within Task Manager regarding claiming
tasks by id that is no longer used. The code was previously used for
running alerting rules right away but recently `runNow` got replaced
with `runSoon` which no longer needs this code to function. For more
info on the previous change, see
https://github.com/elastic/kibana/issues/133550.
* Adding scaling metrics
* Adding utilization tests
* Changing the key name
* Updating to use new api
* Updating task created counter
* Fixing tests
* Fixing tests
* Adding telemetry
* Changing telemetry field
* Updating telemetry schema
* Fixing failing test
* Fixed typos
* Update x-pack/plugins/task_manager/server/routes/background_task_utilization.test.ts
Co-authored-by: Ying Mao <ying.mao@elastic.co>
* Addressing pr feedback
* Updating to use configurable interval
* Updating metrics to be counts
Co-authored-by: Ying Mao <ying.mao@elastic.co>
* create base for bulk delete route
* add bulk delete method
* add bulkDelete method for task manager
* first version of retry if conflicts for bulk delete
* move waitBeforeNextRetry to separate file
* add happy path for unit and integration tests
* rewrite retry_if_bulk_delete and add tests
* fix happy path integration test
* add some tests and fix bugs
* make bulk_delete endpoint external
* give up on types for endpoint arguments
* add unit tests for new bulk delete route
* add unit tests for bulk delete rules clients method
* add integrational tests
* api integration test running
* add integrational tests
* use bulk edit constant in log audit
* unskip skiped test
* fix conditional statement for taskIds
* api integration for bulkDelete is done
* small code style changes
* delete comments and rename types
* get rid of pmap without async
* delete not used part of return
* add audit logs for all deleted rules
* add unit tests for audit logs
* delete extra comments and rename constant
* delete extra tests
* fix audit logs
* restrict amount of passed ids to 1000
* fix audit logs again
* fix alerting security tests
* test case when user pass more that 1000 ids
* fix writing and case when you send no args
* fix line in the text
* delete extra test
* fix type for rules we passing to bulk delete
* add catch RuleTypeDisabledError
* wait before next retry func tests
* fix tests for retry function
* fix bulk delete rule client tests
* add to api return task ids failed to be deleted and wrap task manager call in try catch
* fix type for task manager
Co-authored-by: Xavier Mouligneau <xavier.mouligneau@elastic.co>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Changes search service/search session infrastructure to improve performance, stability, and resiliency by ensuring that search sessions don’t add additional load on a cluster when the feature is not used
* Splitting bulk enable and disable and resetting runAt and scheduledAt on enable
* Adding functional test
* Adding functional test
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* wip
* wip
* Fixing types and adding unit tests for task manager disable
* Updating to enable/disable. Update rules client to use new fns
* Updating unit tests. Fixing enable to still schedule task if necessary
* Adding functional test for task manager migration
* Fixing query. Updating functional tests
* Setting scheduledTaskId to null on disable only if it does not match rule id
* Updating README
* Fixing tests
* Task manager runner doesn't overwrite enabled on update
* Updating migration to set enabled: false for failed and unrecognized tasks
* Fixing tests
* PR feedback
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* Initial commit to optionally disable task polling
* Stops polling when flag is false. Removes runtime from health API. Updates health check to not use runtime when not polling
* Fixing types
* Updating tests
* Updating task manager plugin start to use node roles and added tests
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* Showing stack track for alerting task runner
* Adding stack traces for action and task running
* wip
* Updating unit tests
* wip
* Dont return stack trace in action result
* Updating unit tests
* Updating functional tests
* Updating functional tests
* Separate log for error
* Separate log for error
* Moving error log
* Trying out putting stack trace in meta
* two logs and tags
* Adding tags to the error logging
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Addresses: https://github.com/elastic/kibana/issues/124850
## Summary
- Adds new method Task Manager API `bulkUpdateSchedules`
- Adds calling `taskManager.bulkUpdateSchedules` in rulesClient.bulkEdit to update tasks if updated rules have `scheduleTaskId` property
- Enables the rest of operations for rulesClient.bulkEdit (set schedule, notifyWhen, throttle)
-
#### bulkUpdateSchedules
Using `bulkUpdatesSchedules` you can instruct TaskManager to update interval of tasks that are in `idle` status.
When interval updated, new `runAt` will be computed and task will be updated with that value
```js
export class Plugin {
constructor() {
}
public setup(core: CoreSetup, plugins: { taskManager }) {
}
public start(core: CoreStart, plugins: { taskManager }) {
try {
const bulkUpdateResults = await taskManager.bulkUpdateSchedule(
['97c2c4e7-d850-11ec-bf95-895ffd19f959', 'a5ee24d1-dce2-11ec-ab8d-cf74da82133d'],
{ interval: '10m' },
);
// If no error is thrown, the bulkUpdateSchedule has completed successfully.
// But some updates of some tasks can be failed, due to OCC 409 conflict for example
} catch(err: Error) {
// if error is caught, means the whole method requested has failed and tasks weren't updated
}
}
}
```
### in follow-up PRs
- use `taskManager.bulkUpdateSchedules` in rulesClient.update (https://github.com/elastic/kibana/pull/134027)
- functional test for bulkEdit (https://github.com/elastic/kibana/pull/133635)
### Checklist
- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
### Release note
Adds new method to Task Manager - bulkUpdatesSchedules, that allow bulk updates of scheduled tasks.
Adds 3 new operations to rulesClient.bulkUpdate: update of schedule, notifyWhen, throttle.