kibana/x-pack/test/security_solution_api_integration/scripts
Garrett Spong e9a8909fad
[Security Assistant] Simplifies Security Gen AI Evaluation secret management (#219885)
## Summary

Simplifies secret management for running the Security Gen AI
Evaluations. See updated README.md for full details, but includes:

* Consolidation of multiple vault keys to a single
`KIBANA_SECURITY_GEN_AI_CONFIG` key, which contains all connectors,
langsmith creds and now a way to specify `evaluatorConnectorId`.
* Added `vault` params to both `retrieve_secrets.js` and
`upload_secrets.js` for specifying the vault. Defaults to `sieam-team`
secrets.elastic.co for ease of use by developers.
* Introduces `get_commands.js` script for fetching commands to hand off
to either Kibana Ops for updating, or specifying config overrides when
manually running BuildKite pipelines.
* Deleted `export_env_secrets.js` as it couldn't be used for setting env
vars locally for the dev testing experience.
* Updated `connectors` as per team discussion to include: GPT-4.1,
Claude 3.5/3.7, and Gemini 2.5 Pro. This was a config change made by
Kibana Ops, so no code change present. But you can confirm by running
`retrieve_secrets.js`.

And finally, a much more detailed `README.md` for testing locally, on
PR's and CI, and the process for updating secrets. See full
[README.md](https://github.com/spong/kibana/blob/ci-eval-tweaks/x-pack/test/security_solution_api_integration/test_suites/genai/evaluations/README.md)



Example LangSmith Runs:

* `ES|QL Generation Regression Suite`: [Run
298372](261dcc59-fbe7-4397-a662-ff94042f666c)
* `Alerts RAG Regression (Episodes 1-8)`: [Run
298372](bd5bba1d-97aa-4512-bce7-b09aa943c651)
* `Assistant Eval: Custom Knowledge`: [Run
298372](2d5f7c18-4bf4-4cdb-97a1-16e39a865cab)
* `Eval AD: All Scenarios`: [Run
300138](4690ee16-9df5-416c-8bf0-b62bc2f2aba9/compare?selectedSessions=6d44134b-6492-4f2d-9b28-6d4a82a0e9ae&baseline=undefined)

Note: there is currently a timing bug with Alerts/KB entries being
cleaned up before the server is complete, so you may see poor evals for
`Alerts RAG Regression (Episodes 1-8)` and `Assistant Eval: Custom
Knowledge` until that is fixed. I'll address this in a follow-up PR
since it is unrelated to this change-set.
2025-05-09 11:01:36 -06:00
..
genai/vault [Security Assistant] Simplifies Security Gen AI Evaluation secret management (#219885) 2025-05-09 11:01:36 -06:00
api_configs.json [Security][Serverless] Add Product types in FTR API Integration tests. (#184309) 2024-06-20 17:30:35 +03:00
index.js [EDR Workflows] MKI API tests (#187560) 2024-07-12 14:41:41 +02:00
mki_api_ftr_execution.ts [Security Solution][Serverless] Logging - Fix explore test issue (#195941) 2024-10-15 12:16:31 +03:00
mki_start_api_ftr_execution.js [Security][Serverless] Add Product types in FTR API Integration tests. (#184309) 2024-06-20 17:30:35 +03:00