## Summary
While annotating test failures, we're seeing increased amount of errors
like this:
```
2025-03-21 13:52:32 INFO Artifact uploads completed successfully
--
| Annotate test failures error Request failed with status code 404
| HTTP Error Response Status 404
| HTTP Error Response Body { message: 'Not Found' }
| user command error: exit status 10
```
It would be nicer to show a bit more from the error to help debugging.
## Summary
Similar to https://github.com/elastic/kibana/pull/195581
Adds a pipeline that builds Kibana and starts cloud deployment without
going through the CI test suites (as in normal pull-request pipeline
runs). It can be useful if a developer would like to save time/compute
on re-building/re-testing the whole project before deploying to the
cloud.
Added labels (`ci:cloud-deploy / ci:cloud-redeploy`) are required
similarly to the usual CI flow.
Related to: https://github.com/elastic/kibana-operations/issues/121
## Summary
Extending scout-reporter with `failed-test-reporter`, that saves
failures in json summary file. For each test failure html report file is
generated and linked in summary report:
```
[
{
"name": "stateful - Discover app - saved searches - should customize time range on dashboards",
"htmlReportFilename": "c51fcf067a95b48e2bbf6098a90ab14.html"
},
{
"name": "stateful - Discover app - value suggestions: useTimeRange enabled - dont show up if outside of range",
"htmlReportFilename": "9622dcc1ac732f30e82ad6d20d7eeaa.html"
}
]
```
This PR updates `failed_tests_reporter_cli` to look for potential Scout
test failures and re-generate test failure artifacts in the same format
we already use for FTR ones.
These new artifacts are used to list failures in BK annotation:
<img width="1092" alt="image"
src="https://github.com/user-attachments/assets/09464c55-cdaa-45a4-ab47-c5f0375b701c"
/>
test failure html report example:
<img width="1072" alt="image"
src="https://github.com/user-attachments/assets/81f6e475-1435-445d-82eb-ecf5253c42d3"
/>
Note for reviewer: 3 Scout + 1 FTR tests were "broken" to show/test
reporter, those changes must be reverted before merge. See failed
pipeline
[here](https://buildkite.com/elastic/kibana-pull-request/builds/266822)
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary
In #195581 we've added the option to deploy through the clickable
triggers. But in it's current state, it's broken in several aspects.
(1) It's not starting on click. Triggers was resulting in a 422 on
Buildkite's side, and after digging more into it, this was the error:
<img width="1019" alt="Screenshot 2024-10-16 at 16 53 13"
src="https://github.com/user-attachments/assets/f602dde9-2cc4-474f-b432-a3d4f9d5ae91">
Apparently, building PRs needs to be enabled on jobs that want to be
triggered through the PR bot.
(2) It is set up to run regardless of the labels
(3) There's no feedback on runs
## Changes
This PR:
- enables buildability in the pipeline's config
- exits early if deploy labels are missing
- adds a comment on the PR if a deploy job is started or finished
- removes the kibana build step, it's not needed, as we have a step to
build the docker image
TODO:
- [x] Add feedback about a started job (either through a non-required
check, or a github comment)
- [x] Early exit if a label is missing
There are several other builds started right now, because the logic that
would trigger a build on changing a draft to ready. To be fixed in
https://github.com/elastic/buildkite-pr-bot/issues/78
Tested after manually by enabling the option on the UI, and triggering
through the checkbox:
https://buildkite.com/elastic/kibana-deploy-project-from-pr/builds/23
https://github.com/elastic/kibana/pull/194768 without the merge
conflicts.
Switches over to the org wide PR bot, with backwards compatibility for
both versions.
Updating the pipeline definition here is a global set for environment
variables on all branches, so I intend on merging the backports first to
support both versions and then proceeding with this.
## Summary
The problem we're trying to solve here is to get access to
`elasticsearch-serverless` logs when they're started in docker
containers in the background (and `elasticsearch`, although currently we
don't test against that in docker for now).
## Solution
In essence:
- we needed to remove the `--rm` flag, this would allow for the
containers to stay present after they're done.
- after this, we can run `docker logs ...` on FTR post-hooks, save
these, then archive these files to buildkite
- because the containers are not removed upon finishing, we need to
clean up dangling containers before starting up
Backporting is probably not necessary, because this is only applicable
for serverless - and serverless is only supposed to run on main.
Solves: https://github.com/elastic/kibana/issues/191505
## Summary
- Removes SSH info to avoid confusion since we cannot SSH into agents on
the new infra
- Removes old agent metrics and logs links because they are in a
different cluster and the new links are in an annotation
## Summary
This PR refactors a bit of the pre-command env setup, separating parts,
so they can be individually skipped. Then it removes the setup-avoidance
based on agent types, as this won't be useful after the migration.
Also, it fixes a missed bit in the agent-targeting rewrite used for the
migration, where the `provider: 'gcp'` was missing, and adds an optional
targeting for the script.
- add gcp as provider to all rewritten agent targeting rules
- add option to target specific pipelines
- refactor env-var loading to a separated file
- refactor node installs so it can be switched by a flag
- skip node installing in (some) jobs that don't require it
## Summary
Adding html snapshots to be upload on svl functional test failure. Login
flakiness is a good example where it might help to better understand why
some elements were not found.
<img width="1316" alt="Screenshot 2024-02-14 at 09 56 56"
src="82a66af2-ac23-47d7-bcc2-13e17ae6ff6d">
## Summary
Once we're moving to the elastic-wide buildkite agents, and away from
the kibana-buildkite-managed ones, we won't have default access to the
buckets we used to use, as the assumed service account will differ.
**Note:** Although this will only be required in the new infra, but this
change can be merged and expected to work properly in the current infra
as well.
### Solution
We've set up a central service-account with rights to impersonate other
service accounts that have controlled access to individual buckets to
minimize the reach and influence of individual accounts. See:
https://github.com/elastic/kibana-operations/pull/51
**several of the changes weren't tested, as they're part of CI tasks
outside the PR build** - will merge with caution and monitor the
stability afterwards
TODO: _add access, and assume account before other GCS bucket usages_
- [x] storybook
- [x] coverage
(.buildkite/scripts/steps/code_coverage/reporting/uploadPrevSha.sh)
- [x] upload static site
(.buildkite/scripts/steps/code_coverage/reporting/uploadStaticSite.sh)
- [x] SO object migration
(.buildkite/scripts/steps/archive_so_migration_snapshot.sh)
- [x] ES Snapshot manifest upload
(.buildkite/scripts/steps/es_snapshots/create_manifest.ts)
- [x] Scalability?
(.buildkite/scripts/steps/functional/scalability_dataset_extraction.sh)
- [x] Benchmarking
(.buildkite/scripts/steps/scalability/benchmarking.sh)
- [x] Webpack bundle analyzer
(.buildkite/scripts/steps/webpack_bundle_analyzer/upload.ts)
- [x] ~Build chromium (x-pack/build_chromium/build.py)~ Not needed, as
it's manual, and not a CI task
TODO: _others_
- [x] Remove manifest upload
(.buildkite/scripts/steps/es_serverless/promote_es_serverless_image.sh)
- [x] Decide if we should merge with the CDN access: no, SRE is managing
that account
- [x] Bazel remote cache seems to also rely on gcs - roles PR:
https://github.com/elastic/kibana-operations/pull/56
Closes: https://github.com/elastic/kibana-operations/issues/29
Part of: https://github.com/elastic/kibana-operations/issues/15
In https://github.com/elastic/kibana/pull/173159 we authenticated with
another service account, and were no longer operating under the expected
config. This was causing `gcloud secrets` to access the wrong project
and throw errors.
This revokes the service account after we're done uploading CDN assets
so we can switch back to the default service account.
---------
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
This uploads CDN assets to a GCS bucket on commit and after all tests
have passed. This will run on pull requests with `ci:project-deploy-*`
and `ci:build-serverless-image` labels, and on `main`. Assets will
include the first 12 digits of the commit sha as a base path.
## Summary
We're moving to a different vault address/instance when we're on the
elastic-wide buildkite infra. While the migration is in progress, we can
bridge between using this solution.
✅ Tested the status quo by running the PR pipeline (tests all the loads
from `pre-command`) and by using `ci:cloud-deploy` (tests vault
writing).
🟠 Tested the new vault provider on this PR:
https://github.com/elastic/kibana/pull/171317
The secrets can be accessed, *but they can't be written* (neither by me
nor) by the PR pipeline. Change requested here:
https://elasticco.atlassian.net/browse/ENGPRD-414
However, this PR can be merged without figuring out write access to
secrets, this will work as long as we're on the `kibana-buildkite`
infra.
---
Closes: https://github.com/elastic/kibana-operations/issues/28
Based on: https://github.com/elastic/kibana/pull/157220
---------
Co-authored-by: Jon <jon@budzenski.me>
## Summary
Connected to: https://github.com/elastic/kibana-operations/issues/18
Pre-requisite for:
https://github.com/elastic/kibana-operations/issues/30
You can test the current assistant from the branch:
https://buildkite.com/elastic/kibana-serverless-release-1/builds?branch=buildkite-job-for-deployment
- use `DRY_RUN=1` in the runtime params to not trigger an actual release
:)
This PR creates the contents of a Buildkite job to assist the Kibana
Serverless Release initiation process at the very beginning and lay some
groundwork for further additions to the release management.
At the end of the day, we would like to create a tag deploy@<timestamp>
which will be picked up by another job that listens to these tags:
https://buildkite.com/elastic/kibana-serverless-release. However,
several parts of the preparation for release require manual research,
collecting information about target releases, running scripts, etc.
Any further addition to what would be useful for someone wanting to
start a release could be contained here.
Furthermore, we could also trigger downstream jobs from here. e.g.:
https://buildkite.com/elastic/kibana-serverless-release is currently set
up to listen for a git tag, but we may as well just trigger the job
after we've created a tag.
Check out an example run at:
https://buildkite.com/elastic/kibana-serverless-release-1/builds/72
(visible only if you're a
member of @ elastic/kibana-release-operators)
Missing features compared to the git action:
- [x] Slack notification about the started deploy
- [x] full "useful links" section
Missing features:
- [x] there's a bit of useful context that should be integrated to the
display of the FTR results (*)
- [x] skip listing and analysis if a commit sha is passed in env
(*) - Currently, we display the next FTR test suite that ran after the
merge of the PR. However, the next FTR that will contain the changes,
and show useful info related to the changeset is ONLY in the FTR that's
ran after the first successful onMerge after the merge commit. Meaning:
if main is failing when the change is merged, an FTR suite won't pick up
the change right after.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Thomas Watson <w@tson.dk>
Co-authored-by: Thomas Watson <watson@elastic.co>
## Summary
Fixes failed build steps when there are no tests to execute and hence no
reports.
## Details
Despite this [PR](https://github.com/elastic/kibana/pull/165824) fixes
tests failure when there are no reports but `yarn junit:merge` tries to
convert them to junit format the problem is partially stays as `Failed
Test Reporter` runs for every build step meaning also for successful
build steps. We don't need to handle failed test reports for successful
build steps. There are cases when there are no tests to run so Cypress
doesn't produce reports and nothing is converted to junit format so
`Failed Test Reporter` fails and causes a build step to fails. It could
be solved by defining an environment variable
`DISABLE_MISSING_TEST_REPORT_ERRORS=true` but we need to make sure
reports exist for failed tests. Taking this into account we can rely on
`BUILDKITE_COMMAND_EXIT_STATUS` to avoid running `Failed Test Reporter`
if the build step successed.
One may ask why don't skip `Upload Artifacts` too. We may need some
build artifacts for successful build steps.
---------
Co-authored-by: Tiago Costa <tiago.costa@elastic.co>
## Summary
Kibana's build jobs work on a different subset of executors than the
buildkite pipelines defined in `catalog-info.yaml`.
The former jobs require a set of `pre_command` / `post_command` steps to
prepare the jobs for building/testing kibana.
The latter don't have access rights to certain vault secrets (and
possibly other missing config from the Kibana world), but for now,
they're also not building Kibana, they just trigger other jobs, so we
can just skip the problematic hooks.
~~A probably good indicator I found for deciding whether we need the
kibana-related `pre_command` is the
`BUILDKITE_AGENT_META_DATA_AGENT_MANAGER` flag, that's set to `"kibana"`
in the case of the kibana executors.~~
We can try to match on the agent names for the CI-systems agents. They
seem to be starting with `bk-agent`.
This should allow for the
[kibana-tests](https://buildkite.com/elastic/kibana-tests) job to run.
Split from: https://github.com/elastic/kibana/pull/165346
---------
Co-authored-by: Tiago Costa <tiago.costa@elastic.co>
When buildkite-agent is uploading a pipeline we can skip setting up our
node environment. This stricter check avoids matching on similarly named
steps, e.g. our storybooks upload.
Adds four pipeline steps for running serverless common, observability,
search, and security suites on pull requests.
While tests go through a stabilization period, configs will need to be
added `serverless_ftr.sh` in addition to `ftr_configs.yml`. These tests
will be allowed to fail without causing a build failure. After
stabilization and integration with the primary test pipeline, we can
revert these changes.