Commit graph

488 commits

Author SHA1 Message Date
Shahzad
8b7fa0d3f8
[SLO] Synthetics based SLO e2e tests (#183637)
## Summary

Setting up Elastic/Synthetics based slo e2e tests !!

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
2024-05-20 11:49:22 +02:00
Isaac Karrer
135961b720
Update docker.elastic.co/ci-agent-images/quality-gate-seedling Docker tag to v0.0.4 (#183781)
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| docker.elastic.co/ci-agent-images/quality-gate-seedling | patch |
`0.0.2` -> `0.0.4` |

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNjMuOSIsInVwZGF0ZWRJblZlciI6IjM3LjM2My45IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->

Co-authored-by: Renovate Bot <bot@renovateapp.com>
2024-05-17 15:06:18 -07:00
Alex Szabo
5e95a76796
[CI] Use new-infra-type agent targeting for chainguard build (#183545)
## Summary
This PR ties https://github.com/elastic/kibana/pull/183200 +
https://github.com/elastic/kibana/pull/182582 together
2024-05-15 19:26:44 +02:00
Alex Szabo
834ea8810c
[CI] Increase artifact build timeout (#183482)
## Summary
Increases step timeout for building the whole artifact collection by
15m.

With some recent additions ([chainguard
build](https://github.com/elastic/kibana/pull/183200) adds ~7m) and the
new infra overhead, we've gone from ~50-52 minutes to ~57-60 minutes
(this one timed out exactly on the last bit:
https://buildkite.com/elastic/kibana-artifacts-snapshot/builds/4295#018f7b4c-3629-4c4f-8d80-85b2552a43c4)
2024-05-15 12:28:58 +02:00
Jon
6aa7987eeb
[build] Add image based on chainguard (#183200)
Adds a new docker image, `kibana-chainguard` using
[chainguard-base](https://images.chainguard.dev/directory/image/chainguard-base).
For now this is only for testing, exact naming tbd.

Testing
```
docker load < kibana-chainguard-8.15.0-SNAPSHOT-docker-image-aarch64.tar.gz
docker run --rm docker.elastic.co/kibana/kibana-chainguard:8.15.0-SNAPSHOT
```
2024-05-14 16:10:07 -05:00
Jon
0982835800
[ci] Fix ci:build-serverless-image (#183394)
artifacts_container_image.yml is attempting to select an agent that
doesn't exist on Kibana CI's infrastructure
2024-05-14 09:26:32 -05:00
Patryk Kopyciński
13db1c9b21
Downgrade Cypress to 13.6.2 (#183047)
## Summary

We noticed some instability in Cypress in current version, downgrading
seems to be solving the issue.
2024-05-14 16:04:23 +02:00
Alex Szabo
d5362fdaf7
[BK] Migrate batch 1 (Artifact builds) (#182582)
## Summary
Migrates batch 1 - artifact builds. The upload aspect wasn't tested,
because it's programmed only to run from `main`, and we didn't want to
interfere with the ongoing releases. This can be tested after the merge.

Verification:
- [x] RREs tested locally
- [x] kibana / artifacts trigger
(https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/87)
- [x] kibana / artifacts container image
(https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/86)
- [x] kibana / artifacts snapshot
(https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/88)
- [x] kibana / artifacts staging
(https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/88)
- [x] 8.13 / 8.14 / 7.17 verification (only a few jobs need to work
here)

Originals:
- kibana / artifacts trigger
[kibana-artifacts-trigger.yml](https://buildkite.com/elastic/kibana-artifacts-trigger)
- kibana / artifacts container image
[kibana-artifacts.yml](https://buildkite.com/elastic/kibana-artifacts-container-image)
- kibana / artifacts snapshot
[kibana-artifacts.yml](https://buildkite.com/elastic/kibana-artifacts-snapshot)
- kibana / artifacts staging
[kibana-artifacts.yml](https://buildkite.com/elastic/kibana-artifacts-staging)

Backports:
 - https://github.com/elastic/kibana/pull/182781
 - https://github.com/elastic/kibana/pull/182780
 
The backports don't need to have the pipeline resource definition files,
however, we forked 8.14 off from main, where we already had the
resources. I'll remove all the unnecessary resource defs from the legacy
branches, once we finalize the state (simply to save a little
inconvenience on future backports.)
2024-05-13 16:06:55 +02:00
Alex Szabo
ef35ee9db6
[CI] Mitigate typecheck timeout issues (#183257)
## Summary
We've recently seen a handful of step timeouts when running type-checks.
While this is not the best solution, it mitigates for potential builds
failed, and retries due to timeouts.

This PR also contains some cleanup around previous, type-check related
jobs (e.g.: the [type-check issue of 2023
august](https://github.com/elastic/kibana/pull/167060))
2024-05-13 15:17:12 +02:00
Alex Szabo
38d4230e61
[CI] Comment flaky test results on tested PR (#183043)
## Summary
Extends the flaky-test-runner with the capability to comment on the
flaky test runs on the PR that's being tested.

Closes: https://github.com/elastic/kibana/issues/173129

- chore(flaky-test-runner): Add a step to collect results and comment on
the tested PR
2024-05-13 03:47:30 -07:00
Alex Szabo
a1b32be9a2
[CI] Switch to use elastic-images-prod for all migrated pipeline steps (#183140)
## Summary
We stuck with using the `elastic-images-qa` because that's how we
initially set up the migration scripts and didn't bother to switch over
once we got the images working as most of the pipelines were low-risk,
and a potential issue would have been easy to fix.

While the same image goes to QA and prod every day, moving forward, we
need to allow some experimentation at the QA images level, as we work on
the caching and further optimizations. We shouldn't allow that
experimentation to affect the already migrated pipelines.

This PR switches over to using the `elastic-images-prod` repo. Images
get promoted here if they're built from the `main` of
https://github.com/elastic/ci-agent-images, or promoted manually from a
branch build.

This change should not affect existing behavior. 
We didn't test every pipeline but the assumption is that if one works,
all works:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/94

Will merge this once 8.13 is no longer active.
2024-05-10 10:03:11 -05:00
Jon
240b54180a
[ci/project-deploy] Run more checks before deploying (#183058)
This splits the project build and deploy steps into two: build the
container image, and then deploy. This will allow us to build the
project image in parallel with other checks, and deploy later after a
smoke test is completed. Currently this uses static checks of:
- project image build
- linting
- lint with types
- checks
- type check

Time to project deployment is expected to be ~5~ 1 minute longer. If
needed we can expand to functional tests, but in the interim this should
cover the issue we saw in https://github.com/elastic/kibana/pull/180309.
2024-05-10 06:33:33 -05:00
Jon
07b8df2a1e
[ci/on-merge] Security solution tests depend on quick_checks (#183031)
Currently, if quick checks fail, security solution tests will continue
to run. We want to skip running the extended test suite.
2024-05-09 17:17:00 -05:00
Jon
4982ead45c
[ci/verify-es-serverless] Add annotation with command to run es image locally (#182579)
https://buildkite.com/elastic/kibana-elasticsearch-serverless-verify-and-promote/builds/1074#annotation-es-serverless-run
2024-05-06 17:12:43 -05:00
dkirchan
75c7f1190d
[Security][Serverless] FTR API Integration tests - Refactoring - Issue fixing (#182245)
## Summary

This PR is addressing the following issues:
- The pipelines defined in
`.buildkite/pipeline-resource-definitions/security-solution-quality-gate/`
folder were skipping intermediate builds. We need to be able to run more
than one build in the same time for these pipelines.
- As part of the refactoring / optimization of the
`.buildkite/scripts/pipelines/security_solution_quality_gate/api_integration/api-integration-tests.sh`
script, it now executes a TS script in order to handle the projects for
serverless and execute the yarn script provided.
- As part of this refactoring, the methods and worfklow defined in the
`x-pack/plugins/security_solution/scripts/run_cypress/parallel_serverless.ts`
is now followed in order to reduce code duplication and maintenance.
- Fixed an issue in
`x-pack/test/security_solution_api_integration/scripts/index.js`. This
issue was causing false green test executions in buildkite. The exit
code was not actually returned from the child process so the exit code
of this script was 0, even though the child process (test execution) was
failing giving back an exit code 1.
- Parameterized
`.buildkite/pipelines/security_solution/api_integration.yml` to be
running the correct test suite (release or periodic) depending on
whether the environment variable `QUALITY_GATE=1` is passed or not.

The last bullet was misleading the test results interpretation, reading
as successful test runtime scripts which had one or more test failures.
E.g: [Buildkite Test Execution being green with failing
tests.](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-api-integration/builds/307#018f3409-c062-4edf-9663-3ba785823a6c/294-757)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
2024-05-03 13:50:03 +02:00
Alex Szabo
77961a68e0
[BK] Migrate batch 7 (Performance) (#181133)
## Summary

Validation:
 - [x] RREs checked locally
 - [x] Pipelines staged
- [x] kibana / single-user-performance
(https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/62
 )
- [x] kibana / performance-data-set-extraction
(https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/60#018f0b96-7703-4e36-9924-f405073d0747
 )
- [x] kibana / scalability-benchmarking
(https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/68
 )
- [x] kibana / apis-capacity-testing
(https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/60#018f0b96-7703-4e36-9924-f405073d0747
 )
 - [x] 7.17 / 8.14 validation (   not needed, no branch builds set up)


Part of: https://github.com/elastic/kibana-operations/issues/79

Migrates: 
- kibana / single-user-performance
[kibana-performance-daily.yml](https://buildkite.com/elastic/kibana-single-user-performance)
- kibana / performance-data-set-extraction
[kibana-performance-data-set-extraction-daily.yml](https://buildkite.com/elastic/kibana-performance-data-set-extraction)
- kibana / scalability-benchmarking
[scalability_testing-daily.yml](https://buildkite.com/elastic/kibana-scalability-benchmarking-1)
- kibana / apis-capacity-testing
[kibana-apis-capacity-testing-daily.yml](https://buildkite.com/elastic/kibana-apis-capacity-testing)

chore(BK): Migrate batch 7 - performance and testing

Depends on: https://elasticco.atlassian.net/browse/ENGPRD-524
2024-04-30 11:20:45 +02:00
Alex Szabo
4a90df23b6
[Fix] fix type issues from unparameterized PropsWithChildren type usages (#182014)
## Summary
Original problem: `PropsWithChildren` require a generic type parameter
(there's no default). This was not made visible in the merged PR,
because we had type-checking on the PRs temporarily (accidentally)
removed.

Thsi PR fixes the fallout from
https://github.com/elastic/kibana/pull/181257 => Errors:
https://buildkite.com/elastic/kibana-on-merge/builds/44454
2024-04-29 23:08:52 +01:00
Brad White
306bcf6e85
[ci] Add FIPS Vagrant box and nightly testing pipeline (#176980)
## Summary

- Closes elastic/kibana-operations#26
- Adds a Vagrant box and corresponding Ansible playbook to create a test
environment for FIPS
- Adds a daily pipeline to run a subset of FTR tests in FIPS mode

### Known Issues
1. The compilation of OpenSSL in FIPS mode is breaking some of the OS
libraries and functionality (`sudo` / `dnf` likely more). Possibly due
to custom OpenSSL installation using different locations than the OS
version.
2. ES is having trouble starting, likely due to issue 1 ([Log
link](https://buildkite.com/elastic/kibana-pull-request/builds/205420#018f0c58-3dc3-41c5-a1a5-9d9a9e14aacc/265-552)).
Disabling ML is a temp workaround added in
803945c759, but we likely need it enabled
in the future anyways, so best to find a proper fix. Tracking at
https://github.com/elastic/kibana-operations/issues/96

### Reviewers
You can view a run of the new pipeline during testing
[here](https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/84).

---------

Co-authored-by: Tiago Costa <tiago.costa@elastic.co>
2024-04-26 16:41:56 -07:00
dkirchan
6569a55268
[Security][Quality Gate] Stabilizing API tests and fixing Codeowners (#181877)
## Summary

- Fixed some wrong package.json scripts in order to completely stabilize
tha api-integration test suite for serverless.
- ~Added @elastic/security-engineering-productivity as CODEOWNERS for
all the work done around the second quality gate in .buildkite folder.~
2024-04-26 19:06:02 +02:00
dkirchan
328609e349
[Security][Serverless] Fixed PROXY_URL in api integration tests (#181835)
## Summary

Removed hard coded value for PROXY_URL.
2024-04-26 15:00:53 +02:00
Konrad Szwarc
96bf7b1f06
[MKI][EDR Workflows] Enable MKI on EDR Workflows Cypress tests (#181080)
This PR sets up everything required for running Cypress tests for EDR
Workflows on the MKI QA environment.

MKI pipeline triggered with these changes -
https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-defend-workflows/builds/20

---------

Co-authored-by: dkirchan <diamantis.kirchantzoglou@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Paul Tavares <paul.tavares@elastic.co>
Co-authored-by: dkirchan <55240027+dkirchan@users.noreply.github.com>
2024-04-26 14:10:36 +02:00
Alex Szabo
7ad17c36a2
[BK] Migrate batch 6 (api-docs, fleet-packages, secsol-qg-api) (#180784)
## Summary
Migrates 3 pipelines:
- kibana / api-docs / daily
[kibana-api-docs.yml](https://buildkite.com/elastic/kibana-api-docs-daily)
- kibana / serverless / security-solution-quality-gate / api-integration
[kibana-serverless-security-solution-quality-gate-api-integration.yml](https://buildkite.com/elastic/kibana-serverless-security-solution-quality-gate-api-integration)
- kibana / fleet-packages
[kibana-fleet-packages-daily.yml](https://buildkite.com/elastic/kibana-fleet-packages)

Verification:
 - [x] locally tested the RREs for validity
 - [x] pipelines tested through the migration staging pipeline:
- [x] API-docs -
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/80

- [x] serverless security solution api-integration -
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/48#018ef1ed-853d-4649-b008-3a38b9f97923

- [x] fleet packages -
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/49


Part of: https://github.com/elastic/kibana-operations/issues/79
2024-04-26 10:36:30 +02:00
dkirchan
34c4449607
[Security] Quality Gate multi Organization for projects (#181027)
## Summary

In order to run the tests concurrently we needed the ability to handle
more than one organizations and rotation of api keys in order to create
a project.

This effort is covered by the job done for the cloud-handler
(@elastic/security-engineering-productivity). The cloud-handler is a
Python FastAPI service connected onto a Postgres Database, which handles
the muiltiple organizations for the needs of the Security Kibana Quality
Gate Testing - including the periodic pipeline and the future efforts to
be able to run the tests from Devs against a real MKI.

## Description 
All the logic is pretty much handled in the `parallel_serverless.ts`
script.
[At this
point](https://github.com/elastic/kibana/pull/181027/files#diff-a05c7d7d8448c53e20bbd60881deb4786bfffa3cdf654447732aed02e12b3867R223)
we are getting the combination of PROXY_URL, PROXY_CLIENT_ID and
PROXY_SECRET. All of these three should be defined as the first defines
the URL of the proxy service and the latter define the authentication
with the service.

If all the three of the above mentioned variables are available, plus if
the healthcheck for the service to be up and running is successful
([runs in this
line](https://github.com/elastic/kibana/pull/181027/files#diff-a05c7d7d8448c53e20bbd60881deb4786bfffa3cdf654447732aed02e12b3867R255))
then the script starts creating environments through the proxy handler.
Otherwise it goes back to the default single org execution (with the
problems we have faced and tackling with this effort).

If the flow procceeds with the proxy service then it creates the
environment (the create environment request body is not changed at all
so no change needs to be done in the test codebase) and then a response
is returned indicating in the response body the organization-name that
is being used.
e.g.:
```
{
    "alias": "local-gizmo-tests-e2ebcd",
    "cloud_id": "local-gizmo-tests:ZXUtd2VzdC0xLmF3cy5xYS5lbGFzdGljLmNsb3VkJGUyZWJjZGZmMzY0YTRmYjliMjRmOGVkMGM0MjI2NThlLmVzJGUyZWJjZGZmMzY0YTRmYjliMjRmOGVkMGM0MjI2NThlLmti",
    "project_id": "e2ebcdff364a4fb9b24f8ed0c422658e",
    "name": "local-gizmo-tests",
    "region_id": "aws-eu-west-1",
    "project_type": "security",
    "admin_features_package": "standard",
    "creds_password": "f6RoNM84wQ4tBml3p13069uJ",
    "creds_username": "admin",
    "elasticsearch_endpoint": "https://local-gizmo-tests-e2ebcd.es.eu-west-1.aws.qa.elastic.cloud",
    "kibana_endpoint": "https://local-gizmo-tests-e2ebcd.kb.eu-west-1.aws.qa.elastic.cloud",
    "created_at": "2024-04-22T15:05:28.970745",
    "id": 1856,
    "organization_id": 16,
    **"organization_name": "sec-sol-auto-01"**
}
```

Then this organization name is used to define the file with the roles
which the saml authentication will be using in order to authenticate the
users. This change is implemented in the following parts:
- [The PROXY_ORG Cypress env
var](https://github.com/elastic/kibana/pull/181027/files#diff-a05c7d7d8448c53e20bbd60881deb4786bfffa3cdf654447732aed02e12b3867R475)
is defined.
- [A roles filename is
created](https://github.com/elastic/kibana/pull/181027/files#diff-5537ddd27eb2b8d7a4809e1bd9a28a4e6c23f3caa6a9b504b9c94ee037070315R34)
if only the PROXY_ORG is defined and handed over to the
SamlSessionManager.
- [If the roles filename is
provided,](https://github.com/elastic/kibana/pull/181027/files#diff-f63bfdabc35b838460de6b7e758d1bc168b54ba6ff418a8ad936d716c88af964R51)
then it respects it, otherwise it uses the default `role_users.json`


## Relevant successful executions:
-
https://buildkite.com/elastic/security-serverless-quality-gate-kibana-periodic/builds/202
-
https://buildkite.com/elastic/security-serverless-quality-gate-kibana-periodic/builds/203

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Gloria Hornero <gloria.hornero@elastic.co>
2024-04-25 14:50:36 +02:00
Jon
f2f5ec927c
[build/serverless] Do not use spot instances (#181578)
This build step doesn't support retries, if the docker image has already
been uploaded once it exits early. We want to rule out spot preemptions
as a cause of failure.
2024-04-24 10:26:11 -05:00
Gloria Hornero
c351e14e83
[Security Solution] Adds serverlessQA tag to the API tests (#180773)
Continues https://github.com/elastic/kibana/pull/179737 effort we are
aligning the tags on Cypress and API to have a unified experience.

## Summary

We want to start integrating our Cypress tests with the serverless
Kibana quality gate. However, not all the teams feel comfortable
enabling all the tests, to facilitate the effort of enabling tests in
the quality gate we are adding the `@serverlessQA` tag, now on API tests
as well.

We use tags to select which tests we want to execute on each environment
and pipeline.

`ess` - runs in ESS env
`serverless` - runs in serverless env and periodic pipeline (failures
don't block release)
`serverlessQA` - runs in kibana release process (failures block release)
`skipInEss` - skipped for ESS env
`skipInServerless` - skipped for all serverless related environments
`skipInServerlessMKI` - skipped for MKI environments

### Description

**Tests tagged as `@serverless`**

All the tests tagged as `@serverless` will be executed as part of the PR
validation process using the serverless FTR environment (not a real
one). That tests will be executed as well as part of the periodic
pipeline using a real serverless project. QA environment is used to do
so using the latest available commit in main at the time of the
execution.

**Tests tagged as `@serverlessQA`**

All the tests tagged as `@serverlessQA` will be executed as part of the
kibana release process using a real serverless project with the latest
image available in the QA environment

**Tests tagged as `@ess`**

All the tests tagged as `@ess` will be executed as part of the PR
validation process using an on-prem ESS environment.

**Tests tagged as `@skipInServerless`**

All the tests tagged as `@skipInServerless` will be excluded from the PR
validation process using the serverless FTR environment, the periodic
pipeline and kibana release process for Serverless.

**Tests tagged as `@skipInEss`**

All the tests tagged as `skipInEss` will be excluded from the PR
validation process using an on-prem ESS environment.

---------

Co-authored-by: Ryland Herrick <ryalnd@gmail.com>
2024-04-23 18:21:13 +02:00
Alex Szabo
507986e9db
[BK] Migrate buildkite batch 5 (unsupported ftr / flaky test runner) (#180403)
## Summary

Validation:
 - [x] RREs checked locally
 - [x] Pipelines staged
- [x] Unsupported FTRs:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/30
(expecting similar errors with the same parameterization of
[this](https://buildkite.com/elastic/kibana-on-merge-unsupported-ftrs/builds/14597#018ec7ff-2ab8-4cd2-a533-00aec6287e88))
- [x] Flaky test runner:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/35
 - [x] Considerations for 
- [x] 7.17:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/46
(PR: https://github.com/elastic/kibana/pull/180575)
- [x] 8.13:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/43
(PR: https://github.com/elastic/kibana/pull/180602)

Backporting should be done manually, as the auto-backports will easily
fail.

Part of: https://github.com/elastic/kibana-operations/issues/79
Migrates: 
- kibana / on merge unsupported ftrs
[kibana-on-merge-unsupported-ftrs.yml](https://buildkite.com/elastic/kibana-on-merge-unsupported-ftrs)
- kibana / flaky-test-suite-runner
[kibana-flaky.yml](https://buildkite.com/elastic/kibana-flaky-test-suite-runner)
2024-04-17 11:13:15 +02:00
Dzmitry Lemechko
dcf7206ad3
[ci] increase perf run step timeout to 90m (#180738)
## Summary

Running journeys on CI works takes longer vs bare metal and folks are
adding more journeys. Increasing timeout for the step.
2024-04-15 00:33:43 -07:00
Jonathan Budzenski
8162230b1c [ci] Fix quick checks key 2024-04-11 14:34:45 -05:00
Jon
c1e76ad2ff
[ci/on-merge] Wait for quick checks and build to complete (#180537)
Before proceeding with tests. This is already implemented in the pull
request pipeline, and will allow the pipeline to end early if there's
lint errors.
2024-04-11 08:31:27 -05:00
Julia Bardi
b328a5e6e8
[Fleet] added Fleet synthetic check to staging quality gates (#180461)
## Summary

Closes https://github.com/elastic/ingest-dev/issues/3065

Added Fleet synthetic monitor check to Kibana staging quality gates.
It has been stable in the past two weeks, added with soft fail for now.

This monitor verifies that a long running project is healthy in staging.
It aims to flag issues if there is a breaking change in Kibana / Fleet
plugin.
2024-04-10 16:35:03 +02:00
Alex Szabo
731174bcf8
[BK] Migrate Batch 4 (ES verification) (#180346)
## Summary
Creates new Buildkite RRE definitions for batch 4 (ES snapshot
verification + ES serverless image verification).
Updates agent targeting rules in affected pipeline implementations.

- [x] RREs validated with `docker.elastic.co/ci-agent-images/pipelib`'s
scripts locally
- [x] Tested pipelines through the pipeline staging job
- [x] Serverless suite:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/22
- [x] ES Snapshot build:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/26
- [x] ES Snapshot verify:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/27
(basically started running, but failed due to test failures - doesn't
seem to be related to the infra change)
- [x] ES Snapshot promote:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/28

Part of: https://github.com/elastic/kibana-operations/issues/79
Migration of: 
- kibana / elasticsearch serverless verify and promote
[kibana-es-snapshots.yml](https://buildkite.com/elastic/kibana-elasticsearch-serverless-verify-and-promote)
- kibana / elasticsearch snapshot build
[kibana-es-snapshots.yml](https://buildkite.com/elastic/kibana-elasticsearch-snapshot-build)
- kibana / elasticsearch snapshot promote
[kibana-es-snapshots.yml](https://buildkite.com/elastic/kibana-elasticsearch-snapshot-promote)
- kibana / elasticsearch snapshot verify
[kibana-es-snapshots.yml](https://buildkite.com/elastic/kibana-elasticsearch-snapshot-verify)

---------

Co-authored-by: Jon <jon@budzenski.me>
2024-04-10 11:00:32 +02:00
Alex Szabo
97c0e1d445
[CI] Add integration tests ES serverless verification (#180317)
## Summary
Prior to this change, we've only run Serverless-related FTR tests on the
[ES Serverless verification & promotion
job](https://buildkite.com/elastic/kibana-elasticsearch-serverless-verify-and-promote).
This was not exactly complete coverage, by this, we've left out some of
the serverless tests in the jest-integration set.

This PR adds running (the complete) Jest Integration test set on the
verification pipeline (we currently don't have a way to filter for
serverless-related tests only), and this required the integration test
startup code to start respecting the ES Serverless docker image override
we use in the test setup.
2024-04-09 09:46:03 +02:00
Dario Gieselaar
685e1b5eba
[AI Assistant] Compatibility with Portkey Gateway (#179026)
- Adds option for custom headers in OpenAI connector, which is needed to
configure [Portkey's gateway](https://github.com/Portkey-AI/gateway)
- Removes `additionalProperties`, `additionalItems` which is not
compatible with OpenAPI (which is what Google Gemini uses)
- Uses `tools` instead of `functions`, which is converted by Portkey
Gateway (`functions` is ignored/passed through as-is)

---------

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2024-04-04 07:51:57 -07:00
Alex Szabo
b11f758785
[BK] Buildkite migration: recreate pipelines with no history retention (#179822)
## Summary
This PR migrates some pipelines that can be migrated the cheap way: by
deleting them, and re-creating them the backstage-way.

Recreates pipelines removed in:
https://github.com/elastic/kibana-buildkite/pull/168

Plus:
- adds the coverage job as a RRE, as it was not previously managed by
terraform, but we got a green light to remove the manually created
pipeline, and recreate it
- adds a script to update locations.yml after any updates (useful to
settle conflicts, or manual additions)
- updates the slack channel for the grammar update script (as requested
by @stratoula)
 
 Todos:
- [x] Fix grammar sync script, to work on new infra (ssh/https
switchover + access rights)
(https://github.com/elastic/kibana/pull/179921)
- [x] Fix missing `antlr` issue:
https://buildkite.com/elastic/kibana-migration-pipeline-staging/builds/16#018e9fab-1609-4ae2-b771-3b346cc616ac
2024-04-04 15:37:31 +02:00
Patryk Kopyciński
aa6a905ffd
[security_solution] Smarter retries for Cypress (#179585)
## Summary

This PR introduces a significant improvement in the way we handle
failing tests within our Cypress test suite. Previously, when a test
spec failed during a job, our approach was to retry the entire set of
specs, which was not only time-consuming but also inefficient. This
process often resulted in unnecessary reruns of tests that had already
passed, leading to increased resource consumption and longer feedback
cycles for developers.

With the changes introduced in this PR, we now target a more efficient
and logical approach by retrying only the specific spec that failed,
rather than the entire suite. This focused retry logic means that if a
job encounters a failing test, only that particular test will be rerun.
This adjustment significantly reduces the overall execution time of our
test suite and minimizes the consumption of valuable Builtkie resources.

Key benefits of this change include:

- **Reduced Test Execution Time**: By avoiding unnecessary reruns of
passing tests, we significantly cut down the total time spent on test
executions.
- **Improved Resource Utilization**: This change ensures a more
judicious use of our CI/CD resources, allowing for more efficient
processing of jobs and reducing potential bottlenecks in our testing
pipeline.
- **Faster Feedback Loops**: Developers will receive quicker feedback on
the status of their tests, enabling them to address failures more
promptly and efficiently.
- **Increased Test Suite Reliability**: By focusing on retrying only the
failing tests, we can more accurately identify flaky tests and work
towards improving the stability of our test suite.
2024-03-29 16:28:36 +01:00
dkirchan
01cf91240d
[Security] Migrate security quality gate pipelines and ESS (#179606)
As part of the migration of pipelines from kibana-buildkite repo here, I
have migrated the Defend Workflows pipeline and the ESS Security
Solution.

Relevant PR in kibana-buildkite repo:
https://github.com/elastic/kibana-buildkite/pull/166
2024-03-28 14:23:57 +01:00
dkirchan
b81d5a9a42
[Security] Split quality gate security solution pipelines to add more granularity. (#179145)
This PR aims to deprecate the old all in one pipeline for kibana
serverless security solution cypress and split it into one pipeline per
team to add more granularity.

We need to split the pipelines and then add as well one Test suite per
team.

All this PR does is the split into multiple pipelines, one per team,
with their relevant test suite in buildkite:
- Detections Engine - [Buildkite Test
Suite](https://buildkite.com/organizations/elastic/analytics/suites/serverless-mki-cypress-detection-engine)
- Entity Analytics - [Buildkite Test
Suite](https://buildkite.com/organizations/elastic/analytics/suites/serverless-mki-cypress-entity-analytics)
- Explore - [Buildkite Test
Suite](https://buildkite.com/organizations/elastic/analytics/suites/serverless-mki-cypress-explore)
- Gen AI - [Buildkite Test
Suite](https://buildkite.com/organizations/elastic/analytics/suites/serverless-mki-cypress-gen-ai)
- Investigations - [Buildkite Test
Suite](https://buildkite.com/organizations/elastic/analytics/suites/serverless-mki-cypress-investigations)
- Rule management - [Buildkite Test
Suite](https://buildkite.com/organizations/elastic/analytics/suites/serverless-mki-cypress-rule-management)


Relevant Tickets:
- https://github.com/elastic/security-team/issues/8903
- https://github.com/elastic/security-team/issues/8801

---------

Co-authored-by: Alex Szabo <alex.szabo@elastic.co>
2024-03-27 18:21:16 +01:00
Jonathan Budzenski
6e215b7096 [ci/on_merge] Fix dependency 2024-03-22 10:03:06 -05:00
Alex Szabo
d7865619c8
[Ops] Adjust serverless release settings (#179250)
## Summary
When we rolled out the pipeline settings resulting from the defaults of
the backstage-way of defining resources, we encountered a few defaults
we didn't know about.

This PR adjusts these missing values (they're not relevant for this job,
as this job is more a process starter than a branch-related build job)
and renames a used pipeline implementation to something more accurate.
2024-03-22 14:31:30 +01:00
Jon
8fcf476cbe
[ci] Re-add Defend Workflows to on-merge (#179112)
We recently had failures on the es serverless promotion pipeline and
noted these tests were not running on merge. This re-adds the tests to
on-merge to be consistent with the rest of the cypress tests.
2024-03-22 08:22:18 -05:00
Alex Szabo
09a9f71b89
[Ops/BK] Migrate serverless-release (#179063)
Candidate for the first buildkite pipeline to be migrated from
`kibana-buildkite` to the elastic-wide system.
The pipeline represented is:
https://buildkite.com/elastic/kibana-serverless-release-1

Quirk:
- When the pipeline was created, another pipeline was created (that's
since now been removed) with the same name, so this was assigned a `-1`
at the end. Hopefully not a problem when we're considering the takeover

This PR contains:
- `kibana-serverless-release.yml` - an automatic rewrite of the pipeline
resource from
https://github.com/elastic/kibana-buildkite/blob/main/pipelines/kibana-serverless-release.tf
- `locations.yml` - since we collect the pipelines in such a file to
avoid bloating `catalog-info.yaml`, the location needs to be updated
- `create_deployment_tag.yml` - updates the pipeline implementations
with agent targeting rules (since we no longer can use
(https://github.com/elastic/buildkite-agent-manager)

---------

Co-authored-by: Jon <jon@budzenski.me>
2024-03-22 11:38:00 +01:00
Drew Tate
1eae619bea
[ES|QL] grammar sync job (#178347)
## Summary

Introduces a CI job to check for changes to the Elasticsearch grammar.

Part of https://github.com/elastic/kibana/issues/178262

The first time this job runs, it will result in a PR to update the
grammar because of formatting differences. That should be merged. Then,
it will only create a PR when something has changed on the Elasticsearch
side.

---------

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2024-03-21 17:01:48 -05:00
Gloria Hornero
76cc2b0be0
[Security Solution] Adds Explore and EA Cypress tests to the security solution mki pipelines (#179054)
## Summary

Adds Explore and EA Cypress executions to the Security Solution MKI
pipelines.
2024-03-20 18:33:40 +01:00
Jonathan Budzenski
f2741f9a65 [ci/verify_es_serverless] Increase Defend Workflows parallelism
These tests are hitting the timeout.  This matches the parallelism used
on pull requests.
2024-03-14 08:59:52 -05:00
Alex Szabo
52ff9b08c8
[Ops] Propagate DRY_RUN to gpctl-promote (#178658)
## Summary
We've had some issues with the weekly scheduled serverless release after
we've switched to a direct trigger to `gpctl-promote`.

Although we've tried a few options, and we've probably found the
solution, but we can still go for sure, and try a dry-run cross trigger.
This wasn't possible before, but
https://github.com/elastic/gpctl/pull/261 should now respect DRY_RUN env
vars coming in.

This PR propagates those variables, so we can test the setup.
2024-03-14 10:45:27 +01:00
Alex Szabo
a89fb9b2fb
[Ops] Refactor env loading & fix agent-targeting rewrite (#178320)
## Summary
This PR refactors a bit of the pre-command env setup, separating parts,
so they can be individually skipped. Then it removes the setup-avoidance
based on agent types, as this won't be useful after the migration.

Also, it fixes a missed bit in the agent-targeting rewrite used for the
migration, where the `provider: 'gcp'` was missing, and adds an optional
targeting for the script.

- add gcp as provider to all rewritten agent targeting rules
- add option to target specific pipelines
- refactor env-var loading to a separated file
- refactor node installs so it can be switched by a flag
- skip node installing in (some) jobs that don't require it
2024-03-12 16:31:26 +01:00
Alex Szabo
ab10cc2d1d
[Ops] Prevent emergency-release image build on commits already in main (#177736)
## Summary
We'd like to prevent container builds when forked to a
`deploy-fix@<timestamp>` branch, on commits that are already contained
in `main` (thus already built into an image).

This happens when forking off, the branch is created with the commit
from `main`'s `HEAD`, the pipeline picks up the first commit, and fails
on building a container that already exists.

Solution:
- check if the current commit is in `(upstream|origin)/main` - if it is,
we don't need to emit the trigger step.

Tests:
- [x] Test trigger: in [this
build](https://buildkite.com/elastic/kibana-serverless-emergency-release-branch-testing/builds/12#_),
I accidentally inverted the DRY_RUN functionality, at least we know the
trigger works if needed.
- [x] Test with a supplied commit sha (this
[build](https://buildkite.com/elastic/kibana-serverless-emergency-release-branch-testing/builds/14#018dd6fe-2d3d-4430-adf2-e8dd50c8f79c))

Bonus:
- Fixes an emoji (in a different trigger step) that's nonexistent in
Buildkite, but we just copied it over from other labels 🤷 (from #176505
)

Closes: https://github.com/elastic/kibana-operations/issues/68
2024-03-11 17:46:15 +01:00
dkirchan
ae6cab1026
[Security] Fixed FTR API Integration test suites (#178188)
## Summary

After some changes and refactors in the codebase, the yaml file which
spins up the tests was affected and not working. This PR addresses the
required fixes in order to make the serverless tests running again.
2024-03-07 13:04:17 +02:00
Gloria Hornero
5cc19a13a1
[Security Solution] Adds investigations execution to the security solution mki pipelines (#178072)
## Summary

Adds investigations execution to the security solution mki pipelines.
2024-03-07 10:59:09 +01:00
Alex Szabo
7390350791
[Ops] Create a pipeline staging job (#178136)
## Summary
This job is to help with the kibana-buildkite -> elastic-wide buildkite
migration (https://github.com/elastic/kibana-operations/issues/15).

The idea is the following:
- this PR will create a backstage resource in `catalog-info.yaml` that
triggers the creation of a buildkite pipeline. (*)
- this pipeline will be within the `gobld` universe, using the
elastic-wide infrastructure and agent images.
- if we use this pipeline, edit, and run on a specific branch, we can
test further pipelines in the `gobld` universe, thus we can test a
pipeline's behavior before creating a resource for it, or altering the
currently existing pipelines

(*) - by creating this pipeline, it also tests the idea of having a
router file (`locations.yml`) instead of cramming every pipeline def to
`catalog-info.yaml`
2024-03-07 09:44:09 +01:00