This PR contains the following updates:
| Package | Update | Change |
|---|---|---|
| docker.elastic.co/ci-agent-images/quality-gate-seedling | patch |
`0.0.2` -> `0.0.4` |
---
### Configuration
📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).
🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNjMuOSIsInVwZGF0ZWRJblZlciI6IjM3LjM2My45IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->
Co-authored-by: Renovate Bot <bot@renovateapp.com>
Adds a new docker image, `kibana-chainguard` using
[chainguard-base](https://images.chainguard.dev/directory/image/chainguard-base).
For now this is only for testing, exact naming tbd.
Testing
```
docker load < kibana-chainguard-8.15.0-SNAPSHOT-docker-image-aarch64.tar.gz
docker run --rm docker.elastic.co/kibana/kibana-chainguard:8.15.0-SNAPSHOT
```
## Summary
We've recently seen a handful of step timeouts when running type-checks.
While this is not the best solution, it mitigates for potential builds
failed, and retries due to timeouts.
This PR also contains some cleanup around previous, type-check related
jobs (e.g.: the [type-check issue of 2023
august](https://github.com/elastic/kibana/pull/167060))
## Summary
Extends the flaky-test-runner with the capability to comment on the
flaky test runs on the PR that's being tested.
Closes: https://github.com/elastic/kibana/issues/173129
- chore(flaky-test-runner): Add a step to collect results and comment on
the tested PR
## Summary
We stuck with using the `elastic-images-qa` because that's how we
initially set up the migration scripts and didn't bother to switch over
once we got the images working as most of the pipelines were low-risk,
and a potential issue would have been easy to fix.
While the same image goes to QA and prod every day, moving forward, we
need to allow some experimentation at the QA images level, as we work on
the caching and further optimizations. We shouldn't allow that
experimentation to affect the already migrated pipelines.
This PR switches over to using the `elastic-images-prod` repo. Images
get promoted here if they're built from the `main` of
https://github.com/elastic/ci-agent-images, or promoted manually from a
branch build.
This change should not affect existing behavior.
We didn't test every pipeline but the assumption is that if one works,
all works:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/94
Will merge this once 8.13 is no longer active.
This splits the project build and deploy steps into two: build the
container image, and then deploy. This will allow us to build the
project image in parallel with other checks, and deploy later after a
smoke test is completed. Currently this uses static checks of:
- project image build
- linting
- lint with types
- checks
- type check
Time to project deployment is expected to be ~5~ 1 minute longer. If
needed we can expand to functional tests, but in the interim this should
cover the issue we saw in https://github.com/elastic/kibana/pull/180309.
## Summary
This PR is addressing the following issues:
- The pipelines defined in
`.buildkite/pipeline-resource-definitions/security-solution-quality-gate/`
folder were skipping intermediate builds. We need to be able to run more
than one build in the same time for these pipelines.
- As part of the refactoring / optimization of the
`.buildkite/scripts/pipelines/security_solution_quality_gate/api_integration/api-integration-tests.sh`
script, it now executes a TS script in order to handle the projects for
serverless and execute the yarn script provided.
- As part of this refactoring, the methods and worfklow defined in the
`x-pack/plugins/security_solution/scripts/run_cypress/parallel_serverless.ts`
is now followed in order to reduce code duplication and maintenance.
- Fixed an issue in
`x-pack/test/security_solution_api_integration/scripts/index.js`. This
issue was causing false green test executions in buildkite. The exit
code was not actually returned from the child process so the exit code
of this script was 0, even though the child process (test execution) was
failing giving back an exit code 1.
- Parameterized
`.buildkite/pipelines/security_solution/api_integration.yml` to be
running the correct test suite (release or periodic) depending on
whether the environment variable `QUALITY_GATE=1` is passed or not.
The last bullet was misleading the test results interpretation, reading
as successful test runtime scripts which had one or more test failures.
E.g: [Buildkite Test Execution being green with failing
tests.](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-api-integration/builds/307#018f3409-c062-4edf-9663-3ba785823a6c/294-757)
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary
- Fixed some wrong package.json scripts in order to completely stabilize
tha api-integration test suite for serverless.
- ~Added @elastic/security-engineering-productivity as CODEOWNERS for
all the work done around the second quality gate in .buildkite folder.~
This PR sets up everything required for running Cypress tests for EDR
Workflows on the MKI QA environment.
MKI pipeline triggered with these changes -
https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-defend-workflows/builds/20
---------
Co-authored-by: dkirchan <diamantis.kirchantzoglou@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Paul Tavares <paul.tavares@elastic.co>
Co-authored-by: dkirchan <55240027+dkirchan@users.noreply.github.com>
## Summary
In order to run the tests concurrently we needed the ability to handle
more than one organizations and rotation of api keys in order to create
a project.
This effort is covered by the job done for the cloud-handler
(@elastic/security-engineering-productivity). The cloud-handler is a
Python FastAPI service connected onto a Postgres Database, which handles
the muiltiple organizations for the needs of the Security Kibana Quality
Gate Testing - including the periodic pipeline and the future efforts to
be able to run the tests from Devs against a real MKI.
## Description
All the logic is pretty much handled in the `parallel_serverless.ts`
script.
[At this
point](https://github.com/elastic/kibana/pull/181027/files#diff-a05c7d7d8448c53e20bbd60881deb4786bfffa3cdf654447732aed02e12b3867R223)
we are getting the combination of PROXY_URL, PROXY_CLIENT_ID and
PROXY_SECRET. All of these three should be defined as the first defines
the URL of the proxy service and the latter define the authentication
with the service.
If all the three of the above mentioned variables are available, plus if
the healthcheck for the service to be up and running is successful
([runs in this
line](https://github.com/elastic/kibana/pull/181027/files#diff-a05c7d7d8448c53e20bbd60881deb4786bfffa3cdf654447732aed02e12b3867R255))
then the script starts creating environments through the proxy handler.
Otherwise it goes back to the default single org execution (with the
problems we have faced and tackling with this effort).
If the flow procceeds with the proxy service then it creates the
environment (the create environment request body is not changed at all
so no change needs to be done in the test codebase) and then a response
is returned indicating in the response body the organization-name that
is being used.
e.g.:
```
{
"alias": "local-gizmo-tests-e2ebcd",
"cloud_id": "local-gizmo-tests:ZXUtd2VzdC0xLmF3cy5xYS5lbGFzdGljLmNsb3VkJGUyZWJjZGZmMzY0YTRmYjliMjRmOGVkMGM0MjI2NThlLmVzJGUyZWJjZGZmMzY0YTRmYjliMjRmOGVkMGM0MjI2NThlLmti",
"project_id": "e2ebcdff364a4fb9b24f8ed0c422658e",
"name": "local-gizmo-tests",
"region_id": "aws-eu-west-1",
"project_type": "security",
"admin_features_package": "standard",
"creds_password": "f6RoNM84wQ4tBml3p13069uJ",
"creds_username": "admin",
"elasticsearch_endpoint": "https://local-gizmo-tests-e2ebcd.es.eu-west-1.aws.qa.elastic.cloud",
"kibana_endpoint": "https://local-gizmo-tests-e2ebcd.kb.eu-west-1.aws.qa.elastic.cloud",
"created_at": "2024-04-22T15:05:28.970745",
"id": 1856,
"organization_id": 16,
**"organization_name": "sec-sol-auto-01"**
}
```
Then this organization name is used to define the file with the roles
which the saml authentication will be using in order to authenticate the
users. This change is implemented in the following parts:
- [The PROXY_ORG Cypress env
var](https://github.com/elastic/kibana/pull/181027/files#diff-a05c7d7d8448c53e20bbd60881deb4786bfffa3cdf654447732aed02e12b3867R475)
is defined.
- [A roles filename is
created](https://github.com/elastic/kibana/pull/181027/files#diff-5537ddd27eb2b8d7a4809e1bd9a28a4e6c23f3caa6a9b504b9c94ee037070315R34)
if only the PROXY_ORG is defined and handed over to the
SamlSessionManager.
- [If the roles filename is
provided,](https://github.com/elastic/kibana/pull/181027/files#diff-f63bfdabc35b838460de6b7e758d1bc168b54ba6ff418a8ad936d716c88af964R51)
then it respects it, otherwise it uses the default `role_users.json`
## Relevant successful executions:
-
https://buildkite.com/elastic/security-serverless-quality-gate-kibana-periodic/builds/202
-
https://buildkite.com/elastic/security-serverless-quality-gate-kibana-periodic/builds/203
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Gloria Hornero <gloria.hornero@elastic.co>
This build step doesn't support retries, if the docker image has already
been uploaded once it exits early. We want to rule out spot preemptions
as a cause of failure.
Continues https://github.com/elastic/kibana/pull/179737 effort we are
aligning the tags on Cypress and API to have a unified experience.
## Summary
We want to start integrating our Cypress tests with the serverless
Kibana quality gate. However, not all the teams feel comfortable
enabling all the tests, to facilitate the effort of enabling tests in
the quality gate we are adding the `@serverlessQA` tag, now on API tests
as well.
We use tags to select which tests we want to execute on each environment
and pipeline.
`ess` - runs in ESS env
`serverless` - runs in serverless env and periodic pipeline (failures
don't block release)
`serverlessQA` - runs in kibana release process (failures block release)
`skipInEss` - skipped for ESS env
`skipInServerless` - skipped for all serverless related environments
`skipInServerlessMKI` - skipped for MKI environments
### Description
**Tests tagged as `@serverless`**
All the tests tagged as `@serverless` will be executed as part of the PR
validation process using the serverless FTR environment (not a real
one). That tests will be executed as well as part of the periodic
pipeline using a real serverless project. QA environment is used to do
so using the latest available commit in main at the time of the
execution.
**Tests tagged as `@serverlessQA`**
All the tests tagged as `@serverlessQA` will be executed as part of the
kibana release process using a real serverless project with the latest
image available in the QA environment
**Tests tagged as `@ess`**
All the tests tagged as `@ess` will be executed as part of the PR
validation process using an on-prem ESS environment.
**Tests tagged as `@skipInServerless`**
All the tests tagged as `@skipInServerless` will be excluded from the PR
validation process using the serverless FTR environment, the periodic
pipeline and kibana release process for Serverless.
**Tests tagged as `@skipInEss`**
All the tests tagged as `skipInEss` will be excluded from the PR
validation process using an on-prem ESS environment.
---------
Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
Before proceeding with tests. This is already implemented in the pull
request pipeline, and will allow the pipeline to end early if there's
lint errors.
## Summary
Closes https://github.com/elastic/ingest-dev/issues/3065
Added Fleet synthetic monitor check to Kibana staging quality gates.
It has been stable in the past two weeks, added with soft fail for now.
This monitor verifies that a long running project is healthy in staging.
It aims to flag issues if there is a breaking change in Kibana / Fleet
plugin.
## Summary
Prior to this change, we've only run Serverless-related FTR tests on the
[ES Serverless verification & promotion
job](https://buildkite.com/elastic/kibana-elasticsearch-serverless-verify-and-promote).
This was not exactly complete coverage, by this, we've left out some of
the serverless tests in the jest-integration set.
This PR adds running (the complete) Jest Integration test set on the
verification pipeline (we currently don't have a way to filter for
serverless-related tests only), and this required the integration test
startup code to start respecting the ES Serverless docker image override
we use in the test setup.
- Adds option for custom headers in OpenAI connector, which is needed to
configure [Portkey's gateway](https://github.com/Portkey-AI/gateway)
- Removes `additionalProperties`, `additionalItems` which is not
compatible with OpenAPI (which is what Google Gemini uses)
- Uses `tools` instead of `functions`, which is converted by Portkey
Gateway (`functions` is ignored/passed through as-is)
---------
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
## Summary
This PR migrates some pipelines that can be migrated the cheap way: by
deleting them, and re-creating them the backstage-way.
Recreates pipelines removed in:
https://github.com/elastic/kibana-buildkite/pull/168
Plus:
- adds the coverage job as a RRE, as it was not previously managed by
terraform, but we got a green light to remove the manually created
pipeline, and recreate it
- adds a script to update locations.yml after any updates (useful to
settle conflicts, or manual additions)
- updates the slack channel for the grammar update script (as requested
by @stratoula)
Todos:
- [x] Fix grammar sync script, to work on new infra (ssh/https
switchover + access rights)
(https://github.com/elastic/kibana/pull/179921)
- [x] Fix missing `antlr` issue:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/16#018e9fab-1609-4ae2-b771-3b346cc616ac
## Summary
This PR introduces a significant improvement in the way we handle
failing tests within our Cypress test suite. Previously, when a test
spec failed during a job, our approach was to retry the entire set of
specs, which was not only time-consuming but also inefficient. This
process often resulted in unnecessary reruns of tests that had already
passed, leading to increased resource consumption and longer feedback
cycles for developers.
With the changes introduced in this PR, we now target a more efficient
and logical approach by retrying only the specific spec that failed,
rather than the entire suite. This focused retry logic means that if a
job encounters a failing test, only that particular test will be rerun.
This adjustment significantly reduces the overall execution time of our
test suite and minimizes the consumption of valuable Builtkie resources.
Key benefits of this change include:
- **Reduced Test Execution Time**: By avoiding unnecessary reruns of
passing tests, we significantly cut down the total time spent on test
executions.
- **Improved Resource Utilization**: This change ensures a more
judicious use of our CI/CD resources, allowing for more efficient
processing of jobs and reducing potential bottlenecks in our testing
pipeline.
- **Faster Feedback Loops**: Developers will receive quicker feedback on
the status of their tests, enabling them to address failures more
promptly and efficiently.
- **Increased Test Suite Reliability**: By focusing on retrying only the
failing tests, we can more accurately identify flaky tests and work
towards improving the stability of our test suite.
As part of the migration of pipelines from kibana-buildkite repo here, I
have migrated the Defend Workflows pipeline and the ESS Security
Solution.
Relevant PR in kibana-buildkite repo:
https://github.com/elastic/kibana-buildkite/pull/166
## Summary
When we rolled out the pipeline settings resulting from the defaults of
the backstage-way of defining resources, we encountered a few defaults
we didn't know about.
This PR adjusts these missing values (they're not relevant for this job,
as this job is more a process starter than a branch-related build job)
and renames a used pipeline implementation to something more accurate.
We recently had failures on the es serverless promotion pipeline and
noted these tests were not running on merge. This re-adds the tests to
on-merge to be consistent with the rest of the cypress tests.
Candidate for the first buildkite pipeline to be migrated from
`kibana-buildkite` to the elastic-wide system.
The pipeline represented is:
https://buildkite.com/elastic/kibana-serverless-release-1
Quirk:
- When the pipeline was created, another pipeline was created (that's
since now been removed) with the same name, so this was assigned a `-1`
at the end. Hopefully not a problem when we're considering the takeover
This PR contains:
- `kibana-serverless-release.yml` - an automatic rewrite of the pipeline
resource from
https://github.com/elastic/kibana-buildkite/blob/main/pipelines/kibana-serverless-release.tf
- `locations.yml` - since we collect the pipelines in such a file to
avoid bloating `catalog-info.yaml`, the location needs to be updated
- `create_deployment_tag.yml` - updates the pipeline implementations
with agent targeting rules (since we no longer can use
(https://github.com/elastic/buildkite-agent-manager)
---------
Co-authored-by: Jon <jon@budzenski.me>
## Summary
Introduces a CI job to check for changes to the Elasticsearch grammar.
Part of https://github.com/elastic/kibana/issues/178262
The first time this job runs, it will result in a PR to update the
grammar because of formatting differences. That should be merged. Then,
it will only create a PR when something has changed on the Elasticsearch
side.
---------
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
## Summary
We've had some issues with the weekly scheduled serverless release after
we've switched to a direct trigger to `gpctl-promote`.
Although we've tried a few options, and we've probably found the
solution, but we can still go for sure, and try a dry-run cross trigger.
This wasn't possible before, but
https://github.com/elastic/gpctl/pull/261 should now respect DRY_RUN env
vars coming in.
This PR propagates those variables, so we can test the setup.
## Summary
This PR refactors a bit of the pre-command env setup, separating parts,
so they can be individually skipped. Then it removes the setup-avoidance
based on agent types, as this won't be useful after the migration.
Also, it fixes a missed bit in the agent-targeting rewrite used for the
migration, where the `provider: 'gcp'` was missing, and adds an optional
targeting for the script.
- add gcp as provider to all rewritten agent targeting rules
- add option to target specific pipelines
- refactor env-var loading to a separated file
- refactor node installs so it can be switched by a flag
- skip node installing in (some) jobs that don't require it
## Summary
We'd like to prevent container builds when forked to a
`deploy-fix@<timestamp>` branch, on commits that are already contained
in `main` (thus already built into an image).
This happens when forking off, the branch is created with the commit
from `main`'s `HEAD`, the pipeline picks up the first commit, and fails
on building a container that already exists.
Solution:
- check if the current commit is in `(upstream|origin)/main` - if it is,
we don't need to emit the trigger step.
Tests:
- [x] Test trigger: in [this
build](https://buildkite.com/elastic/kibana-serverless-emergency-release-branch-testing/builds/12#_),
I accidentally inverted the DRY_RUN functionality, at least we know the
trigger works if needed.
- [x] Test with a supplied commit sha (this
[build](https://buildkite.com/elastic/kibana-serverless-emergency-release-branch-testing/builds/14#018dd6fe-2d3d-4430-adf2-e8dd50c8f79c))
Bonus:
- Fixes an emoji (in a different trigger step) that's nonexistent in
Buildkite, but we just copied it over from other labels 🤷 (from #176505
)
Closes: https://github.com/elastic/kibana-operations/issues/68
## Summary
After some changes and refactors in the codebase, the yaml file which
spins up the tests was affected and not working. This PR addresses the
required fixes in order to make the serverless tests running again.
## Summary
This job is to help with the kibana-buildkite -> elastic-wide buildkite
migration (https://github.com/elastic/kibana-operations/issues/15).
The idea is the following:
- this PR will create a backstage resource in `catalog-info.yaml` that
triggers the creation of a buildkite pipeline. (*)
- this pipeline will be within the `gobld` universe, using the
elastic-wide infrastructure and agent images.
- if we use this pipeline, edit, and run on a specific branch, we can
test further pipelines in the `gobld` universe, thus we can test a
pipeline's behavior before creating a resource for it, or altering the
currently existing pipelines
(*) - by creating this pipeline, it also tests the idea of having a
router file (`locations.yml`) instead of cramming every pipeline def to
`catalog-info.yaml`