## Summary
- Centralized Scout reporter settings
- Added owner area and config/test file information to reporter events
- Attempt to upload events at the end of a test run
- Enable Scout reporter test events upload for the `pull request` and
`on merge` pipelines
## Summary
Part of https://github.com/elastic/kibana-team/issues/1271
This PR introduces the first set of end to end integration test for the
inference APIs, and the tooling required to do so (see issue for more
context)
- Add a dedicated pipeline for ai-infra GenAI tests. pipeline is
triggered when:
- genAI stack connectors, or ai-infra owned code is changed
- when the `ci:all-gen-ai-suites` label is present on a PR
- on merge
- adapt the `ftr_configs.sh` script to load GenAI connector
configuration from vault when a specific var env is set
- create the `@kbn/gen-ai-functional-testing` package, which for now
only contains utilities to load the GenAI connector configuration in FTR
tests
- Add FTR integration tests for the `chatComplete` API of the
`inference` plugin
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Currently CI is configuring a yarn local mirror that is ignored due to
the repository `.yarnrc` taking precedence.
Instead of configuring this setting, this moves the cached mirror over
to the Kibana directory in line with the repository's configuration.
Chromedriver is currently downloaded at runtime on each agent. We know
the expected version of Chrome at image build time, and can re-use the
matching driver already installed instead.
This sets `XDG_CACHE_HOME` to `$HOME/.cache` to persist the chromedriver
installation. Details on the specification can be found at
https://specifications.freedesktop.org/basedir-spec/latest/. Other
packages, including cypress, playwright, bazelisk and yarn also respect
this environment variable, but are already falling back to the
`$HOME/.cache` directory.
This also removes `CHROMEDRIVER_FORCE_DOWNLOAD`, which I believe is an
artifact of legacy code:
https://github.com/elastic/kibana/blob/6.7/.ci/packer_cache.sh#L17-L26.
At one point node_modules was initially loaded from an archive to speed
up bootstrap times. The intent was to redownload chromedriver because
the Chrome version on the agent image was upgraded independently of the
bootstrap cache, potentially causing version mismatches. The impact of
re-downloading was also less significant, as there was less
parallelization in favor of large machines running parallel jobs.
Currently CI is configuring a yarn offline mirror outside of the Kibana
directory, with the intention of caching assets during image build. This
configuration is ignored due to .yarnrc taking precedence, resulting in
the offline mirror being setup in the local Kibana installation. On CI
start, a fresh checkout of the repository is made and the cache
directory is empty.
Instead of setting a user level configuration this modifies .yarnrc with
the intended directory.
## Summary
Kibana requires security to be enabled and a platinum or better license
to run in FIPS mode.
Since not all FTR configs assume these conditions will be enabled, we
cant run every test. So these failing tests will be skipped when these
overrides are enforced.
This does not mean that the functionality is not supported in FIPS mode.
## What is the point?
Running these tests in FIPS mode is not necessarily to check that the
functionality works as expected, it is to make sure Kibana does not
crash due to unsupported algorithm usage (`md4`, `md5`, etc).
When running in FIPS mode, Node will throw an `unsupported envelope
function` error (with FIPS enabled) if it encounters an unsupported
algorithm, so the more lines of code covered, the more assurance we can
have that features will work in FIPS mode.
## Nature of the changes
To skip a test, a `tag` is added: `this.tags('skipFIPS')`
`this.tags` is only available for `describe('description', function()
{...});`
There should not be any logical changes, just tests wrapped in an extra
block.
I tried to make the wording in the new `describe` block "flow" 😅 if you
prefer different wording in the new `describe` block - please add a
change!
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Nikita Indik <nikita.indik@elastic.co>
## Summary
We've seen several issues stemming from docker failures, like:
```
Error response from daemon: Get "https://docker.elastic.co/v2/": ...
```
This is happening because of rolling docker updates, and should affect
each node for a short time, while connections are draining. The
suggestion was to implement retry logic for docker operations. This PR
tries to cover much of it.
## Summary
Use typescript's async processes to start quick checks in parallel 👍
Check out these for runs:
- happy case:
https://buildkite.com/elastic/kibana-pull-request/builds/227443#01914ca3-1f0d-4178-b539-263fbc588e98
- some broken checks:
https://buildkite.com/elastic/kibana-pull-request/builds/228957#01917607-f7bd-4e08-8c70-7fdc3f9c12d1
Benefits:
- with this (+more CPU) we can speed up the quick-check step's runtime,
from ~15m to ~7m.
- the added benefit is that all checks run so that we won't bail on the
1st one
Disadvantage:
- uglier error output, since we collect the logs asynchronously, and
print it only upon failure
- ~no output printed for happy checks (can be changed)~
Extra:
- additionally, `yarn quick-checks` will now allow devs to run these
checks locally (adjustments made so that the checks won't fail in local
dev)
- added the option to declare a 'context' for tooling loggers, so we can
identify which script logs
Solves 2/3 of https://github.com/elastic/kibana-operations/issues/124
(+speedup)
jq isn't installed on the container running a scan. It isn't needed, so
this wraps the statement in an `if binary exists`. block
This also enables slack notifications on failure.
https://buildkite.com/elastic/kibana-sonarqube/builds/224
Closes https://github.com/elastic/kibana/issues/186631
Closes https://github.com/elastic/kibana-operations/issues/151
Adds a daily pipeline for running our jest and integration tests against
a Node.js distribution with pointer compression enabled. This is enabled
by setting the environment variable
`CI_FORCE_NODE_POINTER_COMPRESSION=true`
I would prefer a cleaner implementation, but I'm not seeing a way around
it without changing our defaults globally. Open to ideas. We have to
update three downloads:
1) base node.js install, for jest
2) build node.js install, for integration tests
3) bazel workspace install, for dependencies
https://buildkite.com/elastic/kibana-pointer-compression/builds/6
---------
Co-authored-by: Tiago Costa <tiago.costa@elastic.co>
This PR removes the usage of the native module version of `re2` and
replaces it with a js port called `re2js`.
It also ends our usage of native node modules in production and it
removes the task from the build as well. Further steps will be taken
along our strategy to avoid future usages of native node modules in prod
environments.
## Summary
Closes#188272
A check was added to in #181187 which detects if the environment has
FIPS enabled NodeJS, but Kibana is not setup properly. This adds the
Kibana setting for FIPS in CI and the Docker image. Note there are still
license issues on some tests due to #181187 as well, but this will be
handled in another PR.
## Goal
We'd like to introduce a way to run pipelines that have a dependency on
the currently active branch set (managed in
[versions.json](./versions.json)).
With this, we'd like to migrate over the `es-forward` pipelines
(currently:
[this](https://buildkite.com/elastic/kibana-7-dot-17-es-8-dot-15-forward-compatibility),
and
[this](https://buildkite.com/elastic/kibana-7-dot-17-es-8-dot-14-forward-compatibility))
to the new buildkite infra.
## Summary
This PR introduces a new pipeline:
https://buildkite.com/elastic/kibana-trigger-version-dependent-jobs
(through
[trigger-version-dependent-jobs.yml](.buildkite/pipeline-resource-definitions/trigger-version-dependent-jobs.yml)).
The purpose of this new pipeline is to take the name of a "pipelineSet"
that refers to a pipeline, and based on the `versions.json` file, work
out what are the branches on which the referred pipeline should be
triggered.
### Example: `Trigger ES forward compatibility tests`
- a scheduled run on
[kibana-trigger-version-dependent-jobs](https://buildkite.com/elastic/kibana-trigger-version-dependent-jobs)
with the env var `TRIGGER_PIPELINE_SET=es-forward` runs
- the pipeline implementation for
`kibana-trigger-version-dependent-jobs` works out (looking at
`versions.json`), that the `es-forward` set should trigger
https://buildkite.com/elastic/kibana-es-forward (doesn't exist prior to
the PR) for (7.17+8.14) and (7.17+8.15)
- the pipeline implementation uploads two trigger steps, running
https://buildkite.com/elastic/kibana-es-forward in two instances with
the relevant parameterization.
Since the trigger parameters are derived from the `versions.json` file,
if we move on and close `8.14`, and open up `8.16`, this will follow,
without having to update the pipeline resources or schedules.
## Changes
- 2 pipelines created:
[trigger-version-dependent-jobs.yml](.buildkite/pipeline-resource-definitions/trigger-version-dependent-jobs.yml),
[kibana-es-forward.yml](.buildkite/pipeline-resource-definitions/kibana-es-forward.yml)
- [x] add kibana-es-forward.yml
- implementation for `trigger-version-dependent-jobs` added
- branch configuration removed from pipelines (kibana-artifacts-staging,
kibana-artifacts-snapshot, kibana-artifacts-trigger)
- added a script for checking RREs validity (moved a few files)
## Verification
I've used the migration staging pipeline (*) to run this:
-
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/130
- Env: `TRIGGER_PIPELINE_SET="artifacts-trigger"`
- Result:
[(success):](https://buildkite.com/elastic/kibana-artifacts-trigger/builds/10806)
it triggered for 8.14 only (as expected)
-
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/131
- Env: `TRIGGER_PIPELINE_SET="es-forward"`
- Result: (success): it generated 2 trigger steps, but since the
es-forward pipeline doesn't exist, the upload step failed
-
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/132
- Env: `TRIGGER_PIPELINE_SET="artifacts-snapshot"`
- Result: (success): it triggered jobs for all 3 open branches
(main/8.14/7.17)
-
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/134
- Env: `TRIGGER_PIPELINE_SET="artifacts-staging"`
- Result: (success): it triggered 8.14 / 7.14, but not for main
(*note: this migration staging pipeline will come in handy even after
the migration, to stage newly created pipelines without creating the
resource up-front)
## Summary
These were used for testing the migration from the kibana-buildkite
infra to the elastic-wide buildkite infra. Now we're done with most of
the migration, we can clean these up.
## Summary
- Closes https://github.com/elastic/kibana-operations/issues/100
- Utilizes FIPS agent from elastic/ci-agent-images#686
- Adds dynamic agent selection during PR pipeline upload
- FIPS agents can be used with `FTR_ENABLE_FIPS_AGENT` env variable or
`ci:enable-fips-agent` label
- Removes agent image config from individual steps in favor of image
config for the whole pipeline.
- Steps can still override this config by adding `image`, `imageProject`
etc
- Adds a conditional assertion to `Check` CI step which validates that
FIPS is working properly
### Testing
- [Pipeline run using FIPS
agents](https://buildkite.com/elastic/kibana-pull-request/builds/215332)
- Failures are expected and this possibly ran with flaky tests
## Summary
With the migration to the shared buildkite infra, we've also switched to
using the ci-prod vault (https://vault-ci-prod.elastic.dev) for all
CI-related secrets. We found it reasonable then, to also switch the
storage of the credentials for the deployments there. It's since been
proven unnecessary, even confusing for developers, as they might not be
adequately set up for accessing the two vaults. We've also learned, that
both of these vault instances are here to stay, so there's no push to
migrate everything to the ci-prod instance.
So, this PR switches back to using the legacy vault in all cases for
storing deployment keys, as it fits better with the developers' daily
secret handling duties.
Also, adds a cleanup part to the purge routine.
- [x] extract vault read / write to a parametric shell script, because
the typescript invocations to vault won't have an easy access to the
`set_in_legacy_vault`
## Summary
On the new infra, the publish step will still require legacy vault
credentials and login.
(https://buildkite.com/elastic/kibana-artifacts-staging/builds/3513#018f7691-73c8-4e6f-862b-328b05d9de3b)
As a fix: this PR digs up the credentials from the vault instead of
gcloud secrets on the new infra.
Also, other usages of role-id/secret-id is used are moved in the
legacy-vault usages, plus minor code re-org, to reduce branching, and
future cleanup.
## Summary
We've found that the command tries to log in to docker in cases when the
`docker` binary is available (installed) but not running on the devices.
This can happen, for example, on a device that's set up a bit
differently than our normal CI executors (e.g.: [bare metal
executors](https://buildkite.com/elastic/kibana-single-user-performance/builds/13383)).
This PR makes it sure that we only interact with docker in that step, if
it's running.
Originally `docker login` was localized to a single step and was managed
only in that step. Over time, this has expanded most pipelines and
steps. This changes the pattern to authenticate once during pre command.
## Summary
This PR refactors a bit of the pre-command env setup, separating parts,
so they can be individually skipped. Then it removes the setup-avoidance
based on agent types, as this won't be useful after the migration.
Also, it fixes a missed bit in the agent-targeting rewrite used for the
migration, where the `provider: 'gcp'` was missing, and adds an optional
targeting for the script.
- add gcp as provider to all rewritten agent targeting rules
- add option to target specific pipelines
- refactor env-var loading to a separated file
- refactor node installs so it can be switched by a flag
- skip node installing in (some) jobs that don't require it
https://github.com/elastic/kibana/pull/177727 updated the endpoint used
to collect APM metrics from CI to a project based deployment. As part of
the change we scaled back the sampling rate to monitor stability. This
reverts the sampling rate change.
Follow up for https://github.com/elastic/kibana/pull/176781 so we can
fix the access to the GCS buckets during snapshot promotion
---------
Co-authored-by: Alex Szabo <alex.szabo@elastic.co>
## Summary
My assumption was probably wrong about the access to the legacy coverage
bucket. Any other account than the default Kibana CI gcloud account
doesn't have write access to that bucket, so with the impersonation, the
copy fails.
This PR separates the copying to two parts, and localizes impersonations
to the required operations not to be interfering with the legacy bucket
accesses.
## Summary
Once we're moving to the elastic-wide buildkite agents, and away from
the kibana-buildkite-managed ones, we won't have default access to the
buckets we used to use, as the assumed service account will differ.
**Note:** Although this will only be required in the new infra, but this
change can be merged and expected to work properly in the current infra
as well.
### Solution
We've set up a central service-account with rights to impersonate other
service accounts that have controlled access to individual buckets to
minimize the reach and influence of individual accounts. See:
https://github.com/elastic/kibana-operations/pull/51
**several of the changes weren't tested, as they're part of CI tasks
outside the PR build** - will merge with caution and monitor the
stability afterwards
TODO: _add access, and assume account before other GCS bucket usages_
- [x] storybook
- [x] coverage
(.buildkite/scripts/steps/code_coverage/reporting/uploadPrevSha.sh)
- [x] upload static site
(.buildkite/scripts/steps/code_coverage/reporting/uploadStaticSite.sh)
- [x] SO object migration
(.buildkite/scripts/steps/archive_so_migration_snapshot.sh)
- [x] ES Snapshot manifest upload
(.buildkite/scripts/steps/es_snapshots/create_manifest.ts)
- [x] Scalability?
(.buildkite/scripts/steps/functional/scalability_dataset_extraction.sh)
- [x] Benchmarking
(.buildkite/scripts/steps/scalability/benchmarking.sh)
- [x] Webpack bundle analyzer
(.buildkite/scripts/steps/webpack_bundle_analyzer/upload.ts)
- [x] ~Build chromium (x-pack/build_chromium/build.py)~ Not needed, as
it's manual, and not a CI task
TODO: _others_
- [x] Remove manifest upload
(.buildkite/scripts/steps/es_serverless/promote_es_serverless_image.sh)
- [x] Decide if we should merge with the CDN access: no, SRE is managing
that account
- [x] Bazel remote cache seems to also rely on gcs - roles PR:
https://github.com/elastic/kibana-operations/pull/56
Closes: https://github.com/elastic/kibana-operations/issues/29
Part of: https://github.com/elastic/kibana-operations/issues/15
## Summary
We're moving to a different vault address/instance when we're on the
elastic-wide buildkite infra. While the migration is in progress, we can
bridge between using this solution.
✅ Tested the status quo by running the PR pipeline (tests all the loads
from `pre-command`) and by using `ci:cloud-deploy` (tests vault
writing).
🟠 Tested the new vault provider on this PR:
https://github.com/elastic/kibana/pull/171317
The secrets can be accessed, *but they can't be written* (neither by me
nor) by the PR pipeline. Change requested here:
https://elasticco.atlassian.net/browse/ENGPRD-414
However, this PR can be merged without figuring out write access to
secrets, this will work as long as we're on the `kibana-buildkite`
infra.
---
Closes: https://github.com/elastic/kibana-operations/issues/28
Based on: https://github.com/elastic/kibana/pull/157220
---------
Co-authored-by: Jon <jon@budzenski.me>
## Summary
There was a small change introduced in
https://github.com/elastic/kibana/pull/170918 that exposes a custom env
var, that started to fail on cache builds on CI
(https://buildkite.com/elastic/kibana-agent-packer-cache/builds/474#018bb945-4910-4f5e-b78b-f020574c5b89).
Apparently the BUILDKITE_BRANCH is not available in the script, it's
because it's the same build script called not from buildkite, but
through packer, which probably doesn't forward all the environment
variables.
In this case, we can probably default to `""` and let the script ignore
that section where this variable is exported, because this export is
probably not meant for the cache build. However, we should keep this in
mind, that the packer/cache build is invoking some scripts with a
different env context (might lead to different results if we depend on
some of these vars).
chore: use an empty default when missing BUILDKITE_BRANCH to prevent
error
This PR adds the ability for us to have an env var on buildkite to allow
us to understand what is the final end target when running inside a
merge queue environment. It will be useful to redirect a couple of
dependent scripts that will start running inside the merge queue
environment once we activate it again
---------
Co-authored-by: Jon <jon@budzenski.me>
**Part of: https://github.com/elastic/security-team/issues/6726**
## Summary
Migrates the prebuilt rules and timelines status API route schema to
OpenAPI. This is exploratory work to assess the level of effort required
to migrate API route schemas from `io-ts` to `zod` generated by OpenAPI
codegen.
**Summary of the changes:**
- Added a CI job that runs code generation in Security Solution and
comments change if there are any.
- Migrated the `/api/detection_engine/rules/prepackaged/_status` route
to use generated `zod` schemas
- Updated schema tests
- Adjusted the code generator templates to handle `strict` schemas,
i.e., schemas that do not allow any extra params
- Updated the error transformation code to work with zod errors.
Validation errors are converted to string representations, like the
following:
<img width="627" alt="image"
src="93002573-972f-42e1-901d-01a19937f568">
After chatting with @KOTungseth, @scottybollinger, and @glitteringkatie
we've decided to add a CI step to the Kibana repo that will run when
changes to next-doc related code is made. This step will checkout the
repository containing configuration for the docs.elastic.dev website
(which is currently private, sorry) and then ensure that the build can
be completed with a local copy of all the repositories. It does this by
reading the `config/content.js` files and cloning all of the
repositories listed, then rewriting the content.js file with a map
telling the build system to read files from the local repos (which are
pre-cached by the packer cache job) and the local Kibana repo (which
represents the changes in the PR).
This script also runs locally by running `node
scripts/validate_next_docs`.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
The new images have an updated gh binary which now requires setting the
`GITHUB_REPO` env var, or calling `gh repo set-default`. I opted for the
env var so that we didn't need to find a good time to execute the CLI
(after the keys are in the env, but before all other user code) or worry
about the logging. This also allows other users of our scripts to
customize as makes sense without having to dive into a bunch of
imperative shell code.
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>