**Epic:** https://github.com/elastic/kibana/issues/174168
**Partially addresses:**
https://github.com/elastic/kibana/issues/202078,
https://github.com/elastic/kibana/issues/210358
## Summary
We started to rework and introduce functional changes to our existing
test plans for prebuilt rule customization, upgrade, and export/import
workflows.
Specifically, this PR:
- Creates a new test plan for prebuilt rule upgrade notifications on the
Rule Management, Rule Details, and Rule Editing pages. The filename is
`prebuilt_rule_upgrade_notifications.md`.
- Extracts the existing scenarios for upgrade notifications on the Rule
Management page from `prebuilt_rule_upgrade_without_preview.md` to
`prebuilt_rule_upgrade_notifications.md`. Also, updates them according
to the most recent UI behavior.
- Adds new scenarios for upgrade notifications on the Rule Details page
to `prebuilt_rule_upgrade_notifications.md`.
- Adds new scenarios for upgrade notifications on the Rule Editing page
to `prebuilt_rule_upgrade_notifications.md`.
The new test plan should be in line with the changes discussed in
https://github.com/elastic/kibana/issues/210358.
## Summary
- Fixes the flaky functional test added in #210547 by adding a network
request intercept and clicking on the correct dropdown button
- Unskips the test file
## References
Closes#211959
### Checklist
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
## Summary
When VM image rebuild is triggered after ES promotion, only the cache
warmup should be built.
This PR also separates the daily full build to a daily base + cache
build (in case ES promotions are failing for some reason, we should
still have a daily cache refresh).
Requires: https://github.com/elastic/ci-agent-images/pull/1295
With this, we'd run a daily base image build and cache build (~40m +
25m) + cache warmups for every promotion (~4x 25m) instead of a full
build and promotion per build (~4x 55m). Ultimately not that much of a
gain 🤷 (4*55=220m => 40+5x25=165m)
## Summary
Resolves https://github.com/elastic/kibana/issues/206924.
This PR adds the following query parameters to the agent list API (`GET
/api/fleet/agents`) in order to enable fetching beyond the first 10,000
hits:
```
searchAfter?: string;
openPit?: boolean;
pitId?: string;
pitKeepAlive?: string;
```
The list agent API response can now include the following properties
```
// the PIT ID used
pit?: string;
// stringified version of the last agent's `sort` field,
// can be passed as `searchAfter` in the next request
nextSearchAfter? string;
```
* `searchAfter` can be used with or without a `pitId`. If using
`searchAfter`, `page` parameter is not accepted.
* `searchAfter` expects a stringified array. (Reviewers: I couldn't get
the Kibana request schema to accept a multi-part query param and convert
it to an array... I think this would be better, please let me know if
you know how to get that to work 🙏)
* `pitKeepAlive` duration (i.e. `30s`, `1m`, etc) must be present when
opening a PIT or retrieving results using a PIT ID.
* These can be used with the existing `sortField` and `sortOrder`
params. They default to `enrolled_at` and `desc` respectively.
### Example using only `searchAfter`:
```
# Retrieve the first 10k hits
curl -X GET 'http://<user>:<pass>@<kibana url>/api/fleet/agents?perPage=10000'
# Grab the `nextSearchAfter` param from the response
# Pass it to the new request to retrieve the next page of 10k hits
curl -X GET 'http://<user>:<pass>@<kibana url>/api/fleet/agents?perPage=10000&searchAfter=<nextSearchAfter>'
```
### Example using `searchAfter` with point-in-time parameters:
```
# Retrieve the first 10k hits and open a PIT
curl -X GET 'http://<user>:<pass>@<kibana url>/api/fleet/agents?perPage=10000&openPit=true&pitKeepAlive=5m'
# Grab the `pit` ID from the response
# Grab the `nextSearchAfter` param from the response
# Pass both to the new request to retrieve the next page of 10k hits
curl -X GET 'http://<user>:<pass>@<kibana url>/api/fleet/agents?perPage=10000&searchAfter=<nextSearchAfter>&pitId=<pit id>&pitKeepAlive=5m'
```
## Testing
I recommend using `scripts/create_agents` to generate bulk agents and
testing the above requests. You can generate new agents between PIT
requests to test that using a PIT ID retains the original state. (An API
functional test was added for this)
Note: you may need to add `&showInactive=true` to all requests if your
fake agents become inactive.
TBD
### Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary
The navigation test was skipped in MKI because opening the maps page
caused a modal that then prevented navigating away from maps to continue
the test.
Opening the maps page has previously been removed from the navigation
test suite and therefore this test doesn't need to be skipped in MKI any
longer.
Closes#196823
### Checklist
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
When running the download script, the following error was printed in the
end:
```
Warning: Got more output options than URLs
```
This fixes the warning by removing the -O option. Removing `--output`
does not work as the file on disk has not the same file name as the own
on the remote server.
As the issue exists for Mac and Linus, both were fixed. I did a quick
manual test on Debian and OS X, both worked as expected.
Fixes https://github.com/elastic/kibana/issues/212523
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This PR contains the following updates:
| Package | Update | Change |
|---|---|---|
| docker.elastic.co/wolfi/chainguard-base | digest | `6dcddd8` ->
`10f7cda` |
---
### Configuration
📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).
🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
This PR has been generated by [Renovate
Bot](https://redirect.github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xMDcuMCIsInVwZGF0ZWRJblZlciI6IjM5LjEwNy4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJUZWFtOk9wZXJhdGlvbnMiLCJiYWNrcG9ydDpza2lwIiwicmVsZWFzZV9ub3RlOnNraXAiXX0=-->
Co-authored-by: elastic-renovate-prod[bot] <174716857+elastic-renovate-prod[bot]@users.noreply.github.com>
I've noticed some serverless projects would encounter `503` errors
shortly after "resuming". When this happens, Elasticsearch needs time to
restore indices and their data before it can fulfill requests
successfully. It was recommended to wait for the cluster / index to have
a healthy green (serverless) / yellow (stateful) status before starting
to run background tasks. This way the task manager will not encounter
503 errors as often which reflect into the metrics.
There are a few functional details to the changes I've made:
- Narrows the health call to the task manager index only
- Waits for green on serverless and yellow on stateful
- Has a timeout of 30s
- Will start claiming tasks after the timeout or when an error is
returned by the API call - to prevent a node not claiming tasks at all
(reduce risk, smoother introduction to this new constraint)
## To verify
- Ensure code reflects functional requirements
- Verify unit tests validate the functionality on various code paths
- Ensure Kibana starts claiming tasks on startup once the health API
responds (can also check on serverless and ECH. I spun up one of each
with this PR)
---------
Co-authored-by: Ying Mao <ying.mao@elastic.co>
## Summary
Replaces many long lists of parameters with `sharedParams` - a list of
commonly used inputs from the shared security rule wrapper.
`sharedParams` should be treated as immutable throughout the entire rule
execution to eliminate confusion about which params are specific to
certain code paths and which ones are simply passed through from the
shared wrapper.
More refactoring will follow to further reduce the pass through param
passing. I attempted to limit the scope of changes in this PR by
destructuring `sharedParams` into the expected param format for some
functions. This also sets us up to remove function passing of
`wrapHits`, `bulkCreate`, etc, which would have required passing more of
these individual shared params deep into rule execution logic.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Closes [#213209](https://github.com/elastic/kibana/issues/213209)
### Ordering Issue
- The instruction about `retrieve_elastic_doc` appears before the
`get_dataset_info` instruction.
- The content is the same, but the order of instructions has changed,
causing a failure in an exact string match.
### Minor Formatting Differences
- Even slight variations in spacing, newlines, or indentations can cause
a test failure.
## Solution
use `systemMessageSorted` - order of instructions can vary, so we sort to compare them
Closes https://github.com/elastic/kibana/issues/213444
The problem is setting the view with the globe view may not set the view
to the exact value. For example setting zoom to 1.74 may move the map to
zoom 1.77. PR resolves this problem by adding a margin of error for
comparing zoom differences.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Closes#199315
## Summary
This PR changes the Maintenance Window UI to respect the date format
configured in Kibana's advanced settings.
3 places needed changing:
- Maintenance window list.
- Maintenance window creation page.
- Event popover in the maintenance window list(for recurring MWs).
## Summary
As a part of Expandable Findings flyout, we will need to move some
Constants, Types, Functions, Components into Security Solution plugin or
Shared package
This PR is phase 2 for Findings (Misconfiguration flyout) which include
moving functions into shared package or security solution plugin
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## 📓 Summary
Closes https://github.com/elastic/streams-program/issues/102
Closes https://github.com/elastic/streams-program/issues/159
This re-work of the enrichment state management introduces XState as
state library to prepare scaling the enrichment part for more processors
and improve performance reducing unnecessary side effects.
## 🤓 Reviewers note
**There is a lot to digest on this PR, I'm open to any suggestion and I
left some notes around to guide the review.
This is also far from perfect as there is margin for other minor DX
improvements for consuming the state machines, but it will all come in
follow-up work after we resolve prioritized work such as integrating the
Schema Editor.**
Most of the changes on this PR are about the state management for the
stream enrichment, but it touches also some other areas to integrate the
event-based flow.
### Stream enrichment machine
This machine handles the complexity around updating/promoting/deleting
processors, and the available simulation states.
It's a root level machine that spawns and manages its children machine,
one for the **simulation** behaviour and one for each **processor**
instantiated.
<img width="950" alt="Screenshot 2025-02-27 at 17 10 03"
src="https://github.com/user-attachments/assets/756a6668-600d-4863-965e-4fc8ccd3a69f"
/>
### Simulation machine
This machine handle the flow around sampling -> simulating, handling
debouncing and determining once a simulation can run or should refresh.
It also spawn a child date range machine to react to the observable time
changes and reloads.
It also derives all the required table configurations (columns, filters,
documents) centralizing the parsing and reducing the cases for
re-computing, since we don't rely anymore on the previous live
processors copy.
<img width="1652" alt="Screenshot 2025-02-27 at 17 33 40"
src="https://github.com/user-attachments/assets/fc1fa089-acb2-4ec5-84bc-f27f81cc6abe"
/>
### Processor machine
A processor can be in different states depending on the changes, not
this tracks each of them independently and send events to the parent
machine to react accordingly. It provide a boost in performance compared
to the previous approach, as we don't have to rerender the whole page
tree since the changes are encapsulated in the machine state.
<img width="1204" alt="Screenshot 2025-03-04 at 11 34 01"
src="https://github.com/user-attachments/assets/0e6b8854-b7c9-4ee8-a721-f4222354d382"
/>
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
When enabling the entity store with a non-superuser with all required
credentials, it returns the following errors:

To fix it, we need to disable security for the saved object client.
While this change sounds scary (exclude security??) there are three
reasons I believe this is the appropriate fix:
* [It's what rules management/alerting/detections does for creating
their hidden/encrypted saved objects.
](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/rules_client_factory.ts#L140)I
view that as the canonical example for doing this kind of work.
* Even with this change, we actually still require the user to have
Saved Object Management capabilities, both in the UI (as a privilege
check) and in the init/enable routes, upstream of where we create the
saved object. You can try this out yourself, the init route will fail
without that privilege.
* We only use that particular Saved Object client in that particular
spot, not throughout the rest of our Saved Object usages.
### How to reproduce it
* On main branch
* With an empty cluster
* Generate data with doc generator
* Login with 'elastic' user and create a test role and user with
following credentials:
* cluster, all
* indices, all
* Kibana, all spaces, all
* Open an anonymous tab and login with the test user
* Enable the entity store with the test user
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
## Summary
Implements controls to have more visibility over the errors, especially
in the initialization phase (populate ELSER indices)
### Changes
- Added timeout to the initialization phase (20 minutes).
- Added concurrency control for initialization tasks, only the first
concurrent migration will trigger it, and the rest will await it.
- Added proper error handling for the ES bulk index operations of
integrations and prebuilt rules ELSER indices.
- Added timeout for individual agent invocations (3 minutes)
- Added `migrationsLastError` server state to store the errors (not
ideal, this should be moved to the migration index when we implement it)
for now it's fine.
- Added the `last_error` in the _/stats_ API response.
- The UI displays the `last_error` if it's defined.
### Screenshots
Onboarding error:

Rules page error:

---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
## Summary
Changes the `event.outcome` badge to no longer have an icon, instead
appearing only when the `event.outcome` value is `failure`, and showing
as a `danger` colored badge.
<img alt="Event Outcome Discover Traces Screenshot 2025-03-04 173032"
src="https://github.com/user-attachments/assets/7c5ffc84-e483-4667-abed-d38461362351"
/>
Closes#213207
### How to Test
Ensure the following is added to your kibana.dev.yml:
```yaml
discover.experimental.enabledProfiles:
- traces-data-source-profile
```
- Go to Discover page, select the APM static data view when on the
oblt-cli cluster.
- On the data grid, all the summary cells for trace data should only
show 3 badges when the `event.outcome` is either `success` or `unknown`.
Only a red badge is shown for traces that have `event.outcome` as
`failure`.