## Summary
Resolves: https://github.com/elastic/kibana/issues/151463
Removes all reference to ephemeral tasks from the task manager plugin.
As well as unit and E2E tests while maintaining backwards compatibility
for `xpack.task_manager.ephemeral_tasks` flag to no-op if set. This PR
has some dependencies from the PR to remove ephemeral task support from
the alerting and actions plugin
(https://github.com/elastic/kibana/pull/197421). So it should be merged
after the other PR.
Deprecates the following configuration settings:
- xpack.task_manager.ephemeral_tasks.enabled
- xpack.task_manager.ephemeral_tasks.request_capacity
The user doesn't have to change anything on their end if they don't wish
to. This deprecation is made so if the above settings are defined,
kibana will simply do nothing.
### Checklist
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Resolves https://github.com/elastic/kibana/issues/192686
## Summary
Creates a background task to search for removed task types and mark them
as unrecognized. Removes the current logic that does this during the
task claim cycle for both task claim strategies.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Resolves https://github.com/elastic/kibana/issues/181145
## Summary
Adds an optional flag `shouldDeleteTask` to a successful task run
result. If this flag is set to true, task manager will remove the task
at the end of the processing cycle. This allows tasks to gracefully
inform us that they need to be deleted without throwing an unrecoverable
error (the current way that tasks tell us they want to be deleted).
Audited existing usages of `throwUnrecoverableError`. Other than usages
within the alerting and actions task runner, which are thrown for valid
error states, all other usages were by tasks that were considered
outdated and should be deleted. Updated all those usages to return the
`shouldDeleteTask` run result.
---------
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Towards: #176585
This PR removes the task skipping logic from TaskManager, PRs for
Alerting and Actions will follow.
## To verify
Rules and actions should be still working as expected.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Resolves https://github.com/elastic/kibana/issues/174353
## Summary
Adds ability for task instance to specify a timeout override that will
be used in place of the task type timeout when running an ad-hoc task.
In the future we may consider allowing timeout overrides for recurring
tasks but this PR limits usage to only ad-hoc task runs.
This timeout override is planned for use by backfill rule execution
tasks so the only usages in this PR are in the functional tests.
Resolves https://github.com/elastic/kibana/issues/174352
## Summary
Adds an optional `priority` definition to task types which defaults to
`Normal` priority. Updates the task claiming update by query to include
a new scripted sort that sorts by priority in descending order so that
highest priority tasks are claimed first.
This priority field is planned for use by backfill rule execution tasks
so the only usages in this PR are in the functional tests.
Also included an integration test that will ping the team if a task type
explicitly sets a priority in the task definition
---------
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Resolves https://github.com/elastic/kibana/issues/163958.
Resolves https://github.com/elastic/kibana/issues/163023.
In this PR, I'm modifying the task store's aggregate function to exclude
tasks that are disabled. This function is only used by the monitoring
functionality of alerting and actions plugin and the Task Manager's
health API which all experienced bugs where they shouldn't be
considering disabled task types.
## To verify
1. Create 20 alerting rules running every 1s
2. Call the `/api/task_manager/_health` endpoint
3. Notice capacity_estimation stats are changing to accomodate 20 rules
constantly running
4. Disable all the alerting rules
5. Call the `/api/task_manager/_health` endpoint (wait for runtime and
workload stats to update by looking at their timestamp and ensuring it's
after the time you disabled the rules)
6. Notice capacity_estimation stats no longer consider the 20 rules that
use to run constantly
Part of https://github.com/elastic/kibana/issues/159342.
In this PR, I'm preparing the non-alerting (rule types) response ops
task types for serverless by defining an explicit task state schema.
This schema is used to validate the task's state before saving but also
when reading. In the scenario an older Kibana node runs a task after a
newer Kibana node has stored additional task state, the unknown state
properties will be dropped. Additionally, this will prompt developers to
be aware that adding required fields to the task state is a breaking
change that must be handled with care. (see
https://github.com/elastic/kibana/issues/155764).
For more information on how to use `stateSchemaByVersion`, see
https://github.com/elastic/kibana/pull/159048 and
https://github.com/elastic/kibana/blob/main/x-pack/plugins/task_manager/README.md.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Resolves: #155766Resolves: #159302
With this PR we aim to skip a task that has invalid direct and indirect
params.
In order to do that,
1- We validate the task params before calling the subtask's run method
and skip if if the task params are invalid.
2- We skip execution of a subtask (rule, action etc) when the run method
of it returns a `SkipError`
Therefore, validations in the run methods needs to be moved to top of
the run method and executed before anything else to return skip if the
data is invalid.
We also added a config to enable/disable the skip feature, and define
the delay duration of task reschedule.
As this may become an infinitive loop, we are supposed to limit the
attempts.
Follow on issue to implement that: #159302
---------
Co-authored-by: Patryk Kopycinski <contact@patrykkopycinski.com>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Fixes https://github.com/elastic/kibana/issues/149344
This PR migrates all plugins to packages automatically. It does this
using `node scripts/lint_packages` to automatically migrate
`kibana.json` files to `kibana.jsonc` files. By doing this automatically
we can simplify many build and testing procedures to only support
packages, and not both "packages" and "synthetic packages" (basically
pointers to plugins).
The majority of changes are in operations related code, so we'll be
having operations review this before marking it ready for review. The
vast majority of the code owners are simply pinged because we deleted
all `kibana.json` files and replaced them with `kibana.jsonc` files, so
we plan on leaving the PR ready-for-review for about 24 hours before
merging (after feature freeze), assuming we don't have any blockers
(especially from @elastic/kibana-core since there are a few core
specific changes, though the majority were handled in #149370).
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Dearest Reviewers 👋
I've been working on this branch with @mistic and @tylersmalley and
we're really confident in these changes. Additionally, this changes code
in nearly every package in the repo so we don't plan to wait for reviews
to get in before merging this. If you'd like to have a concern
addressed, please feel free to leave a review, but assuming that nobody
raises a blocker in the next 24 hours we plan to merge this EOD pacific
tomorrow, 12/22.
We'll be paying close attention to any issues this causes after merging
and work on getting those fixed ASAP. 🚀
---
The operations team is not confident that we'll have the time to achieve
what we originally set out to accomplish by moving to Bazel with the
time and resources we have available. We have also bought ourselves some
headroom with improvements to babel-register, optimizer caching, and
typescript project structure.
In order to make sure we deliver packages as quickly as possible (many
teams really want them), with a usable and familiar developer
experience, this PR removes Bazel for building packages in favor of
using the same JIT transpilation we use for plugins.
Additionally, packages now use `kbn_references` (again, just copying the
dx from plugins to packages).
Because of the complex relationships between packages/plugins and in
order to prepare ourselves for automatic dependency detection tools we
plan to use in the future, this PR also introduces a "TS Project Linter"
which will validate that every tsconfig.json file meets a few
requirements:
1. the chain of base config files extended by each config includes
`tsconfig.base.json` and not `tsconfig.json`
1. the `include` config is used, and not `files`
2. the `exclude` config includes `target/**/*`
3. the `outDir` compiler option is specified as `target/types`
1. none of these compiler options are specified: `declaration`,
`declarationMap`, `emitDeclarationOnly`, `skipLibCheck`, `target`,
`paths`
4. all references to other packages/plugins use their pkg id, ie:
```js
// valid
{
"kbn_references": ["@kbn/core"]
}
// not valid
{
"kbn_references": [{ "path": "../../../src/core/tsconfig.json" }]
}
```
5. only packages/plugins which are imported somewhere in the ts code are
listed in `kbn_references`
This linter is not only validating all of the tsconfig.json files, but
it also will fix these config files to deal with just about any
violation that can be produced. Just run `node scripts/ts_project_linter
--fix` locally to apply these fixes, or let CI take care of
automatically fixing things and pushing the changes to your PR.
> **Example:** [`64e93e5`
(#146212)](64e93e5806)
When I merged main into my PR it included a change which removed the
`@kbn/core-injected-metadata-browser` package. After resolving the
conflicts I missed a few tsconfig files which included references to the
now removed package. The TS Project Linter identified that these
references were removed from the code and pushed a change to the PR to
remove them from the tsconfig.json files.
## No bazel? Does that mean no packages??
Nope! We're still doing packages but we're pretty sure now that we won't
be using Bazel to accomplish the 'distributed caching' and 'change-based
tasks' portions of the packages project.
This PR actually makes packages much easier to work with and will be
followed up with the bundling benefits described by the original
packages RFC. Then we'll work on documentation and advocacy for using
packages for any and all new code.
We're pretty confident that implementing distributed caching and
change-based tasks will be necessary in the future, but because of
recent improvements in the repo we think we can live without them for
**at least** a year.
## Wait, there are still BUILD.bazel files in the repo
Yes, there are still three webpack bundles which are built by Bazel: the
`@kbn/ui-shared-deps-npm` DLL, `@kbn/ui-shared-deps-src` externals, and
the `@kbn/monaco` workers. These three webpack bundles are still created
during bootstrap and remotely cached using bazel. The next phase of this
project is to figure out how to get the package bundling features
described in the RFC with the current optimizer, and we expect these
bundles to go away then. Until then any package that is used in those
three bundles still needs to have a BUILD.bazel file so that they can be
referenced by the remaining webpack builds.
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
* Splitting bulk enable and disable and resetting runAt and scheduledAt on enable
* Adding functional test
* Adding functional test
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Addresses: https://github.com/elastic/kibana/issues/124850
## Summary
- Adds new method Task Manager API `bulkUpdateSchedules`
- Adds calling `taskManager.bulkUpdateSchedules` in rulesClient.bulkEdit to update tasks if updated rules have `scheduleTaskId` property
- Enables the rest of operations for rulesClient.bulkEdit (set schedule, notifyWhen, throttle)
-
#### bulkUpdateSchedules
Using `bulkUpdatesSchedules` you can instruct TaskManager to update interval of tasks that are in `idle` status.
When interval updated, new `runAt` will be computed and task will be updated with that value
```js
export class Plugin {
constructor() {
}
public setup(core: CoreSetup, plugins: { taskManager }) {
}
public start(core: CoreStart, plugins: { taskManager }) {
try {
const bulkUpdateResults = await taskManager.bulkUpdateSchedule(
['97c2c4e7-d850-11ec-bf95-895ffd19f959', 'a5ee24d1-dce2-11ec-ab8d-cf74da82133d'],
{ interval: '10m' },
);
// If no error is thrown, the bulkUpdateSchedule has completed successfully.
// But some updates of some tasks can be failed, due to OCC 409 conflict for example
} catch(err: Error) {
// if error is caught, means the whole method requested has failed and tasks weren't updated
}
}
}
```
### in follow-up PRs
- use `taskManager.bulkUpdateSchedules` in rulesClient.update (https://github.com/elastic/kibana/pull/134027)
- functional test for bulkEdit (https://github.com/elastic/kibana/pull/133635)
### Checklist
- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
### Release note
Adds new method to Task Manager - bulkUpdatesSchedules, that allow bulk updates of scheduled tasks.
Adds 3 new operations to rulesClient.bulkUpdate: update of schedule, notifyWhen, throttle.
* Use response-ops GitHub team everywhere, no more alerting services
* Update x-pack/test/plugin_api_integration/plugins/event_log/kibana.json
Co-authored-by: Ying Mao <ying.mao@elastic.co>
Co-authored-by: Ying Mao <ying.mao@elastic.co>
* make owner attribute required
* Add owner properties in more places
* add test for owner attribute
* add error check too in the test
* Fix tests
* fix tests and update docs
* wip
* More test fixes
* Fix All The Errorz
* Adding more owner attributes
* Update x-pack/test/saved_object_api_integration/common/fixtures/saved_object_test_plugin/kibana.json
Co-authored-by: Larry Gregory <lgregorydev@gmail.com>
* Update x-pack/test/ui_capabilities/common/fixtures/plugins/foo_plugin/kibana.json
Co-authored-by: Larry Gregory <lgregorydev@gmail.com>
* commeeeooonnnn
* Update docs
* soooo many kibanajsons
* adjust plugin generator to add an owner
* Add owner to the plugin generator scripts
* update snapshot
* Fix snapshot
* review updates
Co-authored-by: Larry Gregory <lgregorydev@gmail.com>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* added ability to run ephemeral tasks
* fixed typing
* added typing on plugin
* WIP
* Fix type issues
* Hook up the ephemeral task into the task runner for actions
* Tasks can now run independently of one another
* Use deferred language
* Refactor taskParams slightly
* Use Promise.all
* Remove deferred logic
* Add config options to limit the amount of tasks executing at once
* Add ephemeral task monitoring
* WIP
* Add single test so far
* Ensure we log after actions have executed
* Remove confusing * 1
* Add logic to ensure we fallback to default enqueueing if the total actions is above the config
* Add additional test
* Fix tests a bit, ensure we log the alerting:actions-execute right away and the tests should listen for alerts:execute
* Better tests
* If the queue is at capacity, attempt to execute the ephemeral task as a regular action
* Ensure we run ephemeral tasks before to avoid them getting stuck in the queue
* Do not handle the promise anymore
* Remove unnecessary code
* Properly handle errors from ephemeral task lifecycle
* moved acitons domain out of alerting and into actions plugin
* Remove some tests
* Fix TS and test issues
* Fix type issues
* Fix more type issues
* Fix more type issues
* Fix jest tests
* Fix more jest tests
* Off by default
* Fix jest tests
* Update config for this suite too
* Start of telemetry code
* Fix types and add missing files
* Fix telemetry schema
* Fix types
* Fix more types
* moved load event emission to pollingcycle and added health stats on Ephemeral tasks
* Add more telemetry data based on new health metrics for the ephemeral queue
* Fix tests and types
* Add separate request capacity for ephemeral queue
* Fix telemetry schema and add tests for usage collection
* track polled tasks by persistence and use in capacity estimation instead of executions
* fixed typing
* Bump default capacity
* added delay metric to ephemeral stats
* Fix bad merge
* Fix tests
* Fix tests
* Fix types
* Skip failing tests
* Exclude ephemeral stats from capacity estimation tests
* PR feedback
* More PR feedback
* PR feedback
* Fix merge conflict
* Try fixing CI
* Fix broken lock file from merge
* Match master
* Add this back
* PR feedback
* Change to queue and add test
* Disable ephemeral queue in tests
* Updated desc
* Comment out ephemeral-specific tests tha require the entire test suite to support ephemeral tasks
* Add clarifying comment
Co-authored-by: Gidi Meir Morris <github@gidi.io>
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
When something causes an exception in `TaskRunner.markTaskAsRunning()` its execution fails, but this happens before we update the SO, which means that this failure does not count towards the `attempts` on the task. Task Manager will continue to try running this task for ever.
This PR increments the `attempts` when a failure occurs during `TaskRunner.markTaskAsRunning()` to ensure such a task doesn't continue to run to infinity.
Note that this fix will not affect `scheduled` tasks, as they are designed to _ignore_ their `attempts` and run for ever. In such a case this task will continue to consume Task Manager resources until canceled, but these failures will be logged and could be identified when needed.
resolves#55634resolves#65746
Buffers event docs being written for a fixed interval / buffer size,
and indexes those docs via a bulk ES call.
Also now flushing those buffers at plugin stop() time, which
we couldn't do before with the single index calls, which were
run via `setImmediate()`.
This is a redo of PR https://github.com/elastic/kibana/pull/80941 which
had to be reverted.
resolves https://github.com/elastic/kibana/issues/55634
resolves https://github.com/elastic/kibana/issues/65746
Buffers event docs being written for a fixed interval / buffer size,
and indexes those docs via a bulk ES call.
Also now flushing those buffers at plugin stop() time, which
we couldn't do before with the single index calls, which were
run via `setImmediate()`.
This addresses a bug in Task Manager in the task timeout behaviour. When a recurring task's `retryAt` field is set (which happens at task run), it is currently scheduled to the task definition's `timeout` value, but the original intention was for these tasks to retry on their next scheduled run (originally identified as part of https://github.com/elastic/kibana/issues/39349).
In this PR we ensure recurring task retries are scheduled according to their recurring schedule, rather than the default `timeout` of the task type.
* chore(NA): update gitignore to include first changes from moving into a single package.json
* chore(NA): update gitignore
* chore(NA): move all the dependencies into the single package.json and apply changes to bootstrap
* chore(NA): fix types problems after the single package json
* chore(NA): include code to find the dependencies used across the code
* chore(NA): introduce pure lockfile for install dependencies on build
* chore(NA): update clean task to not delete anything from xpack node_modules
* chore(NA): update gitignore to remove development temporary rules
* chore(NA): update notice file
* chore(NA): update jest snapshots
* chore(NA): fix whitelisted licenses to include a new specify form of an already included one
* chore(NA): remove check lockfile symlinks from child projects
* chore(NA): fix eslint and add missing declared deps on single pkg json
* chore(NA): correctly update notice
* chore(NA): fix failing jest test for storyshots.test.tsx
* chore(NA): fix cypress multi reporter path
* chore(NA): fix Project tests check
* chore(NA): fix problem with logic to detect used dependes on oss build
* chore(NA): include correct x-pack plugins dep discovery
* chore(NA): discover entries under dynamic requires on vis_type_timelion
* chore(NA): remove canvas
* test(NA): fix jest unit tests
* chore(NA): remove double react declaration from storyshot test file
* chore(NA): try removing isOSS check
* chore(NA): support for plugin development
* chore(NA): update logic to fix unit tests and typechecking
* chore(NA): support to run npm scripts in child kbn projects across all envs
* chore(NA): support github checks reporter on x-pack and remove cpy types as the package correctly provides them
* chore(NA): update cpy version
* chore(NA): include last kbn pm changes
* chore(NA): update style on build_production_projects.ts
* chore(NA): remove any cast fom telemetry opt in stats
* chore(NA): remove del and re-use rm -rf again
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* wip
* Adding updateFieldsAndMarkAsFailed function
* Updating UBQ
* Only updating retryAt if marking as claiming
* Updating query
* Updating query to only fail one time tasks that have exceeded max attempts
* Fixing tests
* Fixing tests
* Handling claiming tasks by id
* Removing unused function
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
This PR addresses a list of legacy code debt the plugin has incurred over the past year due to extensive changes in its internals and the adoption of the Kibana Platform.
It includes:
1. The `TaskManager` class has been split into several independent components: `TaskTypeDictionary`, `TaskPollingLifecycle`, `TaskScheduling`, `Middleware`. This has made it easier to understand the roles of the different parts and makes it easier to plug them into the observability work.
2. The exposed `mocks` have been corrected to correctly express the Kibana Platform api
3. The lifecycle has been corrected to remove the need for intermediary streames/promises which we're needed when we first introduced the `setup`/`start` lifecycle to support legacy.
4. The Logger mocks have been replaced with the platform's `coreMocks` implementation
5. The integration tests now test the plugin's actual public api (instead of the internals).
6. The Legacy Elasticsearch client has been replaced with the typed client in response to the deprecation notice.
7. Typing has been narrowed to prevent the `type` field from conflicting with the key in the `TaskDictionary`. This could have caused the displayed `type` on a task to differ from the `type` used in the Dictionary itself (this broke a test during refactoring and could have caused a bug in production code if left).
* bump ts to v4
* MOAR RAM
* fix type errors for OSS
* first pass on x-pack errors
* second pass on x-pack type errors
* 3rd pass on x-pack type-errors
* mute errors if complex cases
* don't delete if spread suffices
* mute other complex cases
* make User fields optional
* fix optional types
* fix tests
* fix typings for time_range
* fix type errors in x-pack/tests
* rebuild kbn-pm
* remove leftovers from master update
* fix alert tests
* [Telemetry Checker] TS4 Fixes
* bump to 4.0.1-rc
* fix new errors in master
* bump typescript-eslint to version supporting TS v4 syntax
* fix merge commit errors
* update to the stable TS version 4.0.2
* bump ts-eslint to version supporting ts v4
* fix typo
* fix type errors after merge
* update ts in another new package.json
* TEMP: remove me
* Revert "TEMP: remove me"
This reverts commit dc0fc3bae6.
* [Telemetry] Update snapshot for new TS4 SyntaxKind
* bump prettier to support TS v4 syntax
* fix prettier rules
* last style change
* fix new type errors
Co-authored-by: Alejandro Fernández Haro <alejandro.haro@elastic.co>
This PR addresses two issues which caused several tests to be flaky in TM.
When `runNow` was introduced to TM we added a pinned query which returned specific tasks by ID.
This query does not have the filter applied to it which causes task to return when they're already marked as `running` but we didn't address these correctly which caused flakyness in the tests.
This didn't cause a broken beahviour, but it did cause beahviour that was hard to reason about - we now address them correctly.
It seems that sometimes, especially if the ES queue is overworked, it can take some time for the update to the underlying task to be visible (we don't user `refresh:true` on purpose), so adding a wait for the index to refresh to make sure the task is updated in time for the next stage of the test.
* mark legacy ES client types as deprecated
* expose es client to plugins and update mocks
* ElasticSearchClientMock --> ElasticsearchClientMock
* expose es client mocks
* expose es client via RequestHandlerContext
* convert test/plugin_functional/config into ts
* convert top_nav test into ts
* add an integration test for the es client
* update comments to refer to the new es client
* fix import paths. do not use extensions
temp
* update docs
* fix other refs
* add test for a custom client
* fix context
* add test for scoped client
* update docs