kibana

mirror of https://github.com/elastic/kibana.git synced 2025-04-23 01:13:23 -04:00

Author	SHA1	Message	Date
Dzmitry Lemechko	aa1037f958	[scalability testing] get the correct Gatling report (#153089 ) ## Summary Adjusting the logic to pick the correct Gatling report after run: It turns out `startWith` was picking the wrong report since 2 api journey names match the pattern: `api.telemetry.cluster_stats.no_cache.json` `api.telemetry.cluster_stats.no_cache.1600_dataviews.json` This PR fixes the issue, so that we report to Telemetry stats for the correct journey. Testing here https://buildkite.com/elastic/kibana-apis-capacity-testing/builds/450 --------- Co-authored-by: Tre' <wayne.seymour@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>	2023-03-10 13:55:34 +01:00
Alejandro Fernández Haro	84cc0eb30f	[Telemetry] Add scalability tests for known bottlenecks (#151110 ) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>	2023-03-10 10:45:36 +01:00
Dzmitry Lemechko	bc44f524ac	[performance] use journey own ftr config to run scalability test (#152596 ) While debugging scalability testing failure for `cloud_security_dashboard` journey, I found that we hardcoded base FTR config to `x-pack/performance/journeys/login.ts` and main issue is that Kibana is not started properly. This PR makes few changes: - update `kbn-performance-testing-dataset-extractor` to save journey path as `configPath` so it can be later used to start ES/Kibana in the scalability run with the same configuration it was run for the single user journey run. - update scalability entry configuration to read base FTR config from generated scalability json file (`configPath` property) How to test: - make sure to clone the latest [kibana-load-testing](https://github.com/elastic/kibana-load-testing) repo and build it `mvn clean test-compile` - from kibana root directory run any api capacity test ``` node scripts/run_scalability.js --journey-path x-pack/test/scalability/apis/api.core.capabilities.json ``` Expected result: logs should display ``` debg Loading config file from x-pack/performance/journeys/login.ts ``` - download the latest artifacts from [buildkite](https://buildkite.com/elastic/kibana-performance-data-set-extraction/builds/171#0186a342-9dea-4a9b-bbe4-c96449563269), find `cloud_security_dashboard-<uuid>.json` - from kibana root directory run scalability test for `cloud_security_dashboard` journey ``` node scripts/run_scalability.js --journey-path <path to cloud_security_dashboard-<uuid>.json> ``` Expected result: logs should display ``` debg Loading config file from x-pack/performance/journeys/cloud_security_dashboard.ts ``` Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>	2023-03-07 13:20:21 +01:00
Dzmitry Lemechko	f6353bcba8	[scalability testing] enable ops metrics logger (#151172 ) ## Summary Enabling Ops Metrics in Kibana logs allows us to see how memory consumption & event loop delay are changing during the test run: ``` [2023-02-14T12:05:39.960+01:00][DEBUG][metrics.ops] memory: 443.0MB uptime: 0:01:09 load: [6.31,5.92,6.18] mean delay: 11.530 delay histogram: { 50: 10.494; 95: 17.416; 99: 26.231 } [2023-02-14T12:05:44.960+01:00][DEBUG][metrics.ops] memory: 559.1MB uptime: 0:01:14 load: [6.04,5.87,6.16] mean delay: 11.738 delay histogram: { 50: 10.420; 95: 18.072; 99: 31.392 } [2023-02-14T12:05:49.971+01:00][DEBUG][metrics.ops] memory: 447.9MB uptime: 0:01:19 load: [7.08,6.09,6.24] mean delay: 13.301 delay histogram: { 50: 10.977; 95: 26.313; 99: 34.505 } [2023-02-14T12:05:54.983+01:00][DEBUG][metrics.ops] memory: 454.0MB uptime: 0:01:24 load: [7.95,6.29,6.31] mean delay: 14.112 delay histogram: { 50: 12.698; 95: 23.069; 99: 36.078 } [2023-02-14T12:05:59.992+01:00][DEBUG][metrics.ops] memory: 573.2MB uptime: 0:01:29 load: [8.52,6.43,6.36] mean delay: 26.276 delay histogram: { 50: 21.103; 95: 60.850; 99: 99.484 } [2023-02-14T12:06:05.018+01:00][DEBUG][metrics.ops] memory: 555.8MB uptime: 0:01:35 load: [10.40,6.85,6.51] mean delay: 82.612 delay histogram: { 50: 76.743; 95: 163.447; 99: 170.131 } [2023-02-14T12:06:10.211+01:00][DEBUG][metrics.ops] memory: 556.3MB uptime: 0:01:40 load: [10.04,6.84,6.51] mean delay: 171.943 delay histogram: { 50: 149.815; 95: 336.069; 99: 341.574 } ``` While running scalability journeys we write ES and Kibana logs to separate files, attached as job artifacts: <img width="1159" alt="image" src="https://user-images.githubusercontent.com/10977896/218800124-c9e4ed11-4f69-43df-bcad-e3de61bd7ce0.png"> Download server-logs.tar.gz to view the Ops Metrics data.	2023-02-22 17:41:21 +01:00
Dzmitry Lemechko	5c8bf9a94c	[scalability testing] skip unloading archives after journey (#151476 ) ## Summary Sometimes scalability testing might make Kibana not responding and it causes after hook with unloading kbn archives to [fail](https://buildkite.com/elastic/kibana-apis-capacity-testing/builds/241#01865418-2579-4559-bd4e-432c48a2104d): ``` 2023-02-15T08:33:37.825Z proc [scalability-tests] proc [gatling: test] Simulation org.kibanaLoadTest.simulation.generic.GenericJourney completed in 268 seconds 2023-02-15T08:38:06.749Z proc [scalability-tests] proc [gatling: test] java.lang.reflect.InvocationTargetException 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at io.gatling.plugin.util.ForkMain.runMain(ForkMain.java:67) 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at io.gatling.plugin.util.ForkMain.main(ForkMain.java:35) 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] Caused by: java.lang.RuntimeException: Login request failed: org.apache.http.NoHttpResponseException: localhost:5620 failed to respond 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at org.kibanaLoadTest.helpers.KbnClient.getCookie(KbnClient.scala:72) 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at org.kibanaLoadTest.helpers.KbnClient.getClientAndConnectionManager(KbnClient.scala:50) 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at org.kibanaLoadTest.helpers.KbnClient.unload(KbnClient.scala:139) 2023-02-15T08:41:06.006Z proc [scalability-tests] proc [gatling: test] at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$new$5(GenericJourney.scala:153) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$new$5$adapted(GenericJourney.scala:153) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$testDataLoader$2(GenericJourney.scala:47) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$testDataLoader$2$adapted(GenericJourney.scala:46) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1321) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at org.kibanaLoadTest.simulation.generic.GenericJourney.testDataLoader(GenericJourney.scala:46) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$new$4(GenericJourney.scala:154) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at io.gatling.core.scenario.Simulation.$anonfun$params$18(Simulation.scala:176) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at io.gatling.core.scenario.Simulation.$anonfun$params$18$adapted(Simulation.scala:176) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at scala.collection.immutable.List.foreach(List.scala:333) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at io.gatling.core.scenario.Simulation.$anonfun$params$17(Simulation.scala:176) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at io.gatling.app.Runner.run(Runner.scala:62) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at io.gatling.app.Gatling$.start(Gatling.scala:89) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at io.gatling.app.Gatling$.fromArgs(Gatling.scala:51) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at io.gatling.app.Gatling$.main(Gatling.scala:39) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] at io.gatling.app.Gatling.main(Gatling.scala) 2023-02-15T08:41:06.007Z proc [scalability-tests] proc [gatling: test] ... 6 more ``` The journey is marked as failed though we actually got the metrics. This PR add flag to Gatling runner command that skips running cleanup on journey teardown.	2023-02-22 12:25:48 +01:00
Dzmitry Lemechko	64d57c5c14	[scalability testing] add json to track skipped journeys (#151629 ) ## Summary Since developers are adding more api capacity tests, we need an easy way to quickly skip the failing/unstable ones. We run these tests on CI by passing path to the root test folder with --journey-path flag: `0e18843e03/.buildkite/scripts/steps/scalability/benchmarking.sh (L82-L83)` The idea is that when there is a need to skip test, it should be added to `x-pack/test/scalability/disabled_scalability_tests.json`: ``` [ "x-pack/test/scalability/apis/api.telemetry.cluster_stats.json" ] ``` Using json array file seems like an easy option (similar to jest config), but I'm open for suggestions.	2023-02-21 17:58:50 +01:00
Dzmitry Lemechko	ff77de662d	[api capacity testing] Adjust endpoint limits (#149333 ) ## Summary This PR tweaks max load and thresholds for some apis in order to get more consistent [results](`c4b07f90`-58eb-563e-8e59-3be09c8074f8?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-3d,to:now))). Current main (last 3 days): <img width="1534" alt="Screenshot 2023-01-24 at 14 53 15" src="https://user-images.githubusercontent.com/10977896/214314206-0b849053-9473-4a98-9f3e-e90461a5a341.png"> PR (4 runs): <img width="1569" alt="Screenshot 2023-01-24 at 15 00 49" src="https://user-images.githubusercontent.com/10977896/214314530-60a5b3e4-2730-4307-946d-cf25d57d04ab.png">	2023-01-25 09:20:49 +01:00
Dzmitry Lemechko	5f31ebf1ce	Benchmark single apis (#146297 ) ## Summary This PR adds capability to run capacity testing for single apis #143066 Currently in main we have to 2 types of performance tests: - single user performance journey that simulates single end-user experience in browser - scalability journey that uses APM traces from single user performance journey to simulate multiple end-users experience This new type of performance tests allow to better understand how each single server api scale under the similar load. How to run locally: make sure to clone the latest main branch of [elastic/kibana-load-testing](https://github.com/elastic/kibana-load-testing) in Kibana repo run: `node scripts/run_scalability.js --journey-path x-pack/test/scalability/apis/api.core.capabilities.json` How it works: FTR is used to start Kibana/ES and run Gatling simulation with json file as input. After run the latest report matching journey name is parsed to get perf metrics and report using EBT to the Telemetry cluster. How will it run after merge: I plan to run pipeline every 3 hours on bare metal machine and report metrics to Telemetry staging cluster. <img width="2023" alt="image" src="https://user-images.githubusercontent.com/10977896/208771628-f4f5dbcb-cb73-40c6-9aa1-4ec3fbf5285b.png"> APM traces are collected and reported to Kibana stats cluster: <img width="1520" alt="image" src="https://user-images.githubusercontent.com/10977896/208771323-4cca531a-eeea-4941-8b01-50b890f932b1.png"> What metrics are collected: 1. warmupAvgResponseTime - average response time during warmup phase 2. rpsAtWarmup - average requests per second during warmup phase 3. warmupDuration 4. responseTimeMetric (default: 85%) Gatling has response time 25/50/75/80/85/90/95/99 percentiles, as well as min/max values 5. threshold1ResponseTime (default 3000 ms) 6. rpsAtThreshold1 requests per second when `responseTimeMetric` first reach threshold1ResponseTime 7. threshold2ResponseTime 8. rpsAtThreshold2 (default 9000 ms) 9. threshold3ResponseTime 10. rpsAtThreshold3 (default 15000 ms) As long as we agree on metrics I will update indexer for telemetry. Co-authored-by: Alejandro Fernández Haro <alejandro.haro@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>	2023-01-09 16:38:30 +01:00
Spencer	afb09ccf8a	Transpile packages on demand, validate all TS projects (#146212 ) ## Dearest Reviewers 👋 I've been working on this branch with @mistic and @tylersmalley and we're really confident in these changes. Additionally, this changes code in nearly every package in the repo so we don't plan to wait for reviews to get in before merging this. If you'd like to have a concern addressed, please feel free to leave a review, but assuming that nobody raises a blocker in the next 24 hours we plan to merge this EOD pacific tomorrow, 12/22. We'll be paying close attention to any issues this causes after merging and work on getting those fixed ASAP. 🚀 --- The operations team is not confident that we'll have the time to achieve what we originally set out to accomplish by moving to Bazel with the time and resources we have available. We have also bought ourselves some headroom with improvements to babel-register, optimizer caching, and typescript project structure. In order to make sure we deliver packages as quickly as possible (many teams really want them), with a usable and familiar developer experience, this PR removes Bazel for building packages in favor of using the same JIT transpilation we use for plugins. Additionally, packages now use `kbn_references` (again, just copying the dx from plugins to packages). Because of the complex relationships between packages/plugins and in order to prepare ourselves for automatic dependency detection tools we plan to use in the future, this PR also introduces a "TS Project Linter" which will validate that every tsconfig.json file meets a few requirements: 1. the chain of base config files extended by each config includes `tsconfig.base.json` and not `tsconfig.json` 1. the `include` config is used, and not `files` 2. the `exclude` config includes `target/*/` 3. the `outDir` compiler option is specified as `target/types` 1. none of these compiler options are specified: `declaration`, `declarationMap`, `emitDeclarationOnly`, `skipLibCheck`, `target`, `paths` 4. all references to other packages/plugins use their pkg id, ie: ```js // valid { "kbn_references": ["@kbn/core"] } // not valid { "kbn_references": [{ "path": "../../../src/core/tsconfig.json" }] } ``` 5. only packages/plugins which are imported somewhere in the ts code are listed in `kbn_references` This linter is not only validating all of the tsconfig.json files, but it also will fix these config files to deal with just about any violation that can be produced. Just run `node scripts/ts_project_linter --fix` locally to apply these fixes, or let CI take care of automatically fixing things and pushing the changes to your PR. > Example: [``64e93e5`` (#146212)](`64e93e5806`) When I merged main into my PR it included a change which removed the `@kbn/core-injected-metadata-browser` package. After resolving the conflicts I missed a few tsconfig files which included references to the now removed package. The TS Project Linter identified that these references were removed from the code and pushed a change to the PR to remove them from the tsconfig.json files. ## No bazel? Does that mean no packages?? Nope! We're still doing packages but we're pretty sure now that we won't be using Bazel to accomplish the 'distributed caching' and 'change-based tasks' portions of the packages project. This PR actually makes packages much easier to work with and will be followed up with the bundling benefits described by the original packages RFC. Then we'll work on documentation and advocacy for using packages for any and all new code. We're pretty confident that implementing distributed caching and change-based tasks will be necessary in the future, but because of recent improvements in the repo we think we can live without them for at least a year. ## Wait, there are still BUILD.bazel files in the repo Yes, there are still three webpack bundles which are built by Bazel: the `@kbn/ui-shared-deps-npm` DLL, `@kbn/ui-shared-deps-src` externals, and the `@kbn/monaco` workers. These three webpack bundles are still created during bootstrap and remotely cached using bazel. The next phase of this project is to figure out how to get the package bundling features described in the RFC with the current optimizer, and we expect these bundles to go away then. Until then any package that is used in those three bundles still needs to have a BUILD.bazel file so that they can be referenced by the remaining webpack builds. Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>	2022-12-22 19:00:29 -06:00
Dzmitry Lemechko	9f7db8f615	[scalability testing] typescript runner (#147002 ) ## Summary Closes #146546 This PR replaces bash script with node-based runner script. Script can take relative path to directory with scalability journey files or relative path to individual journey json file. `node scripts/run_scalability.js --journey-config-path scalability_traces/server` `node scripts/run_scalability.js --journey-config-path scalability_traces/server/api.core.capabilities.json` ### Checklist Delete any items that are not applicable to this PR. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: \| Risk \| Probability \| Severity \| Mitigation/Notes \| \|---------------------------\|-------------\|----------\|-------------------------\| \| Multiple Spaces—unexpected behavior in non-default Kibana Space. \| Low \| High \| Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. \| \| Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. \| High \| Low \| Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. \| \| Code should gracefully handle cases when feature X or plugin Y are disabled. \| Medium \| High \| Unit tests will verify that any feature flag or plugin combination still results in our service operational. \| \| [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) \| ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)	2022-12-06 22:21:52 +01:00
Spencer	50b3b57d9e	[ftr] add first-class support for playwrite journeys (#140680 ) * [ftr] add first-class support for playwrite journeys * [CI] Auto-commit changed files from 'node scripts/generate codeowners' * fix jest test * remove ability to customize kibana server args, if we need it we can add it back * remove dev dir that doesn't exist * fix typo * prevent duplicated array converstion logic by sharing flag reader * remove destructuring of option * fix scalability config and config_path import * fix start_servers args and tests * include simple readme * fix jest tests and support build re-use when changes are just to jest tests Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>	2022-09-22 01:06:46 -07:00

11 commits