Commit graph

11 commits

Author SHA1 Message Date
Dzmitry Lemechko
aa1037f958
[scalability testing] get the correct Gatling report (#153089)
## Summary

Adjusting the logic to pick the correct Gatling report after run:

It turns out `startWith` was picking the wrong report since 2 api
journey names match the pattern:


`api.telemetry.cluster_stats.no_cache.json`
`api.telemetry.cluster_stats.no_cache.1600_dataviews.json`

This PR fixes the issue, so that we report to Telemetry stats for the
correct journey.

Testing here
https://buildkite.com/elastic/kibana-apis-capacity-testing/builds/450

---------

Co-authored-by: Tre' <wayne.seymour@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
2023-03-10 13:55:34 +01:00
Alejandro Fernández Haro
84cc0eb30f
[Telemetry] Add scalability tests for known bottlenecks (#151110)
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2023-03-10 10:45:36 +01:00
Dzmitry Lemechko
bc44f524ac
[performance] use journey own ftr config to run scalability test (#152596)
While debugging scalability testing failure for
`cloud_security_dashboard` journey, I found that we hardcoded base FTR
config to `x-pack/performance/journeys/login.ts` and main issue is that
Kibana is not started properly.

This PR makes few changes:
- update `kbn-performance-testing-dataset-extractor` to save journey
path as `configPath` so it can be later used to start ES/Kibana in the
scalability run with the same configuration it was run for the single
user journey run.
- update scalability entry configuration to read base FTR config from
generated scalability json file (`configPath` property)

How to test:
- make sure to clone the latest
[kibana-load-testing](https://github.com/elastic/kibana-load-testing)
repo and build it `mvn clean test-compile`
- from kibana root directory run any api capacity test
```
node scripts/run_scalability.js --journey-path x-pack/test/scalability/apis/api.core.capabilities.json
```
Expected result: logs should display
```
debg Loading config file from x-pack/performance/journeys/login.ts
```
- download the latest artifacts from
[buildkite](https://buildkite.com/elastic/kibana-performance-data-set-extraction/builds/171#0186a342-9dea-4a9b-bbe4-c96449563269),
find `cloud_security_dashboard-<uuid>.json`
- from kibana root directory run scalability test for
`cloud_security_dashboard` journey
```
node scripts/run_scalability.js --journey-path <path to cloud_security_dashboard-<uuid>.json>
```
Expected result: logs should display 
```
debg Loading config file from x-pack/performance/journeys/cloud_security_dashboard.ts
```

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
2023-03-07 13:20:21 +01:00
Dzmitry Lemechko
f6353bcba8
[scalability testing] enable ops metrics logger (#151172)
## Summary

Enabling Ops Metrics in Kibana logs allows us to see how memory
consumption & event loop delay are changing during the test run:

```
[2023-02-14T12:05:39.960+01:00][DEBUG][metrics.ops] memory: 443.0MB uptime: 0:01:09 load: [6.31,5.92,6.18] mean delay: 11.530 delay histogram: { 50: 10.494; 95: 17.416; 99: 26.231 }
[2023-02-14T12:05:44.960+01:00][DEBUG][metrics.ops] memory: 559.1MB uptime: 0:01:14 load: [6.04,5.87,6.16] mean delay: 11.738 delay histogram: { 50: 10.420; 95: 18.072; 99: 31.392 }
[2023-02-14T12:05:49.971+01:00][DEBUG][metrics.ops] memory: 447.9MB uptime: 0:01:19 load: [7.08,6.09,6.24] mean delay: 13.301 delay histogram: { 50: 10.977; 95: 26.313; 99: 34.505 }
[2023-02-14T12:05:54.983+01:00][DEBUG][metrics.ops] memory: 454.0MB uptime: 0:01:24 load: [7.95,6.29,6.31] mean delay: 14.112 delay histogram: { 50: 12.698; 95: 23.069; 99: 36.078 }
[2023-02-14T12:05:59.992+01:00][DEBUG][metrics.ops] memory: 573.2MB uptime: 0:01:29 load: [8.52,6.43,6.36] mean delay: 26.276 delay histogram: { 50: 21.103; 95: 60.850; 99: 99.484 }
[2023-02-14T12:06:05.018+01:00][DEBUG][metrics.ops] memory: 555.8MB uptime: 0:01:35 load: [10.40,6.85,6.51] mean delay: 82.612 delay histogram: { 50: 76.743; 95: 163.447; 99: 170.131 }
[2023-02-14T12:06:10.211+01:00][DEBUG][metrics.ops] memory: 556.3MB uptime: 0:01:40 load: [10.04,6.84,6.51] mean delay: 171.943 delay histogram: { 50: 149.815; 95: 336.069; 99: 341.574 }
```

While running scalability journeys we write ES and Kibana logs to
separate files, attached as job artifacts:
<img width="1159" alt="image"
src="https://user-images.githubusercontent.com/10977896/218800124-c9e4ed11-4f69-43df-bcad-e3de61bd7ce0.png">

Download server-logs.tar.gz to view the Ops Metrics data.
2023-02-22 17:41:21 +01:00
Dzmitry Lemechko
5c8bf9a94c
[scalability testing] skip unloading archives after journey (#151476)
## Summary

Sometimes scalability testing might make Kibana not responding and it
causes after hook with unloading kbn archives to
[fail](https://buildkite.com/elastic/kibana-apis-capacity-testing/builds/241#01865418-2579-4559-bd4e-432c48a2104d):

```
2023-02-15T08:33:37.825Z proc [scalability-tests]  proc [gatling: test] Simulation org.kibanaLoadTest.simulation.generic.GenericJourney completed in 268 seconds
2023-02-15T08:38:06.749Z proc [scalability-tests]  proc [gatling: test] java.lang.reflect.InvocationTargetException
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.plugin.util.ForkMain.runMain(ForkMain.java:67)
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.plugin.util.ForkMain.main(ForkMain.java:35)
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] Caused by: java.lang.RuntimeException: Login request failed: org.apache.http.NoHttpResponseException: localhost:5620 failed to respond
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at org.kibanaLoadTest.helpers.KbnClient.getCookie(KbnClient.scala:72)
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at org.kibanaLoadTest.helpers.KbnClient.getClientAndConnectionManager(KbnClient.scala:50)
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at org.kibanaLoadTest.helpers.KbnClient.unload(KbnClient.scala:139)
2023-02-15T08:41:06.006Z proc [scalability-tests]  proc [gatling: test] 	at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$new$5(GenericJourney.scala:153)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$new$5$adapted(GenericJourney.scala:153)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$testDataLoader$2(GenericJourney.scala:47)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$testDataLoader$2$adapted(GenericJourney.scala:46)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1321)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at org.kibanaLoadTest.simulation.generic.GenericJourney.testDataLoader(GenericJourney.scala:46)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at org.kibanaLoadTest.simulation.generic.GenericJourney.$anonfun$new$4(GenericJourney.scala:154)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.core.scenario.Simulation.$anonfun$params$18(Simulation.scala:176)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.core.scenario.Simulation.$anonfun$params$18$adapted(Simulation.scala:176)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at scala.collection.immutable.List.foreach(List.scala:333)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.core.scenario.Simulation.$anonfun$params$17(Simulation.scala:176)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.app.Runner.run(Runner.scala:62)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.app.Gatling$.start(Gatling.scala:89)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.app.Gatling$.fromArgs(Gatling.scala:51)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.app.Gatling$.main(Gatling.scala:39)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	at io.gatling.app.Gatling.main(Gatling.scala)
2023-02-15T08:41:06.007Z proc [scalability-tests]  proc [gatling: test] 	... 6 more
```

The journey is marked as failed though we actually got the metrics. This
PR add flag to Gatling runner command that skips running cleanup on
journey teardown.
2023-02-22 12:25:48 +01:00
Dzmitry Lemechko
64d57c5c14
[scalability testing] add json to track skipped journeys (#151629)
## Summary

Since developers are adding more api capacity tests, we need an easy way
to quickly skip the failing/unstable ones.

We run these tests on CI by passing path to the root test folder with
--journey-path flag:

0e18843e03/.buildkite/scripts/steps/scalability/benchmarking.sh (L82-L83)

The idea is that when there is a need to skip test, it should be added
to `x-pack/test/scalability/disabled_scalability_tests.json`:
```
[
   "x-pack/test/scalability/apis/api.telemetry.cluster_stats.json"
]
```

Using json array file seems like an easy option (similar to jest
config), but I'm open for suggestions.
2023-02-21 17:58:50 +01:00
Dzmitry Lemechko
ff77de662d
[api capacity testing] Adjust endpoint limits (#149333)
## Summary

This PR tweaks max load and thresholds for some apis in order to get
more consistent
[results](c4b07f90-58eb-563e-8e59-3be09c8074f8?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-3d,to:now))).

Current main (last 3 days):
<img width="1534" alt="Screenshot 2023-01-24 at 14 53 15"
src="https://user-images.githubusercontent.com/10977896/214314206-0b849053-9473-4a98-9f3e-e90461a5a341.png">

PR (4 runs):

<img width="1569" alt="Screenshot 2023-01-24 at 15 00 49"
src="https://user-images.githubusercontent.com/10977896/214314530-60a5b3e4-2730-4307-946d-cf25d57d04ab.png">
2023-01-25 09:20:49 +01:00
Dzmitry Lemechko
5f31ebf1ce
Benchmark single apis (#146297)
## Summary

This PR adds capability to run capacity testing for single apis #143066

Currently in main we have to 2 types of performance tests:
- single user performance journey that simulates single end-user
experience in browser
- scalability journey that uses APM traces from single user performance
journey to simulate multiple end-users experience

This new type of performance tests allow to better understand how each
single server api scale under the similar load.

How to run locally:
make sure to clone the latest main branch of
[elastic/kibana-load-testing](https://github.com/elastic/kibana-load-testing)
in Kibana repo run:
`node scripts/run_scalability.js --journey-path
x-pack/test/scalability/apis/api.core.capabilities.json`

How it works:
FTR is used to start Kibana/ES and run Gatling simulation with json file
as input. After run the latest report matching journey name is parsed to
get perf metrics and report using EBT to the Telemetry cluster.

How will it run after merge:
I plan to run pipeline every 3 hours on bare metal machine and report
metrics to Telemetry staging cluster.
<img width="2023" alt="image"
src="https://user-images.githubusercontent.com/10977896/208771628-f4f5dbcb-cb73-40c6-9aa1-4ec3fbf5285b.png">


APM traces are collected and reported to Kibana stats cluster:
<img width="1520" alt="image"
src="https://user-images.githubusercontent.com/10977896/208771323-4cca531a-eeea-4941-8b01-50b890f932b1.png">


What metrics are collected:

1. warmupAvgResponseTime - average response time during warmup phase
2. rpsAtWarmup - average requests per second during warmup phase
3. warmupDuration
4. responseTimeMetric (default: 85%) Gatling has response time
25/50/75/80/85/90/95/99 percentiles, as well as min/max values
5. threshold1ResponseTime (default 3000 ms)
6. rpsAtThreshold1 requests per second when `responseTimeMetric` first
reach threshold1ResponseTime
7. threshold2ResponseTime
8. rpsAtThreshold2 (default 9000 ms)
9.  threshold3ResponseTime
10. rpsAtThreshold3 (default 15000 ms)

As long as we agree on metrics I will update indexer for telemetry.

Co-authored-by: Alejandro Fernández Haro <alejandro.haro@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
2023-01-09 16:38:30 +01:00
Spencer
afb09ccf8a
Transpile packages on demand, validate all TS projects (#146212)
## Dearest Reviewers 👋 

I've been working on this branch with @mistic and @tylersmalley and
we're really confident in these changes. Additionally, this changes code
in nearly every package in the repo so we don't plan to wait for reviews
to get in before merging this. If you'd like to have a concern
addressed, please feel free to leave a review, but assuming that nobody
raises a blocker in the next 24 hours we plan to merge this EOD pacific
tomorrow, 12/22.

We'll be paying close attention to any issues this causes after merging
and work on getting those fixed ASAP. 🚀

---

The operations team is not confident that we'll have the time to achieve
what we originally set out to accomplish by moving to Bazel with the
time and resources we have available. We have also bought ourselves some
headroom with improvements to babel-register, optimizer caching, and
typescript project structure.

In order to make sure we deliver packages as quickly as possible (many
teams really want them), with a usable and familiar developer
experience, this PR removes Bazel for building packages in favor of
using the same JIT transpilation we use for plugins.

Additionally, packages now use `kbn_references` (again, just copying the
dx from plugins to packages).

Because of the complex relationships between packages/plugins and in
order to prepare ourselves for automatic dependency detection tools we
plan to use in the future, this PR also introduces a "TS Project Linter"
which will validate that every tsconfig.json file meets a few
requirements:

1. the chain of base config files extended by each config includes
`tsconfig.base.json` and not `tsconfig.json`
1. the `include` config is used, and not `files`
2. the `exclude` config includes `target/**/*`
3. the `outDir` compiler option is specified as `target/types`
1. none of these compiler options are specified: `declaration`,
`declarationMap`, `emitDeclarationOnly`, `skipLibCheck`, `target`,
`paths`

4. all references to other packages/plugins use their pkg id, ie:
	
	```js
    // valid
    {
      "kbn_references": ["@kbn/core"]
    }
    // not valid
    {
      "kbn_references": [{ "path": "../../../src/core/tsconfig.json" }]
    }
    ```

5. only packages/plugins which are imported somewhere in the ts code are
listed in `kbn_references`

This linter is not only validating all of the tsconfig.json files, but
it also will fix these config files to deal with just about any
violation that can be produced. Just run `node scripts/ts_project_linter
--fix` locally to apply these fixes, or let CI take care of
automatically fixing things and pushing the changes to your PR.

> **Example:** [`64e93e5`
(#146212)](64e93e5806)
When I merged main into my PR it included a change which removed the
`@kbn/core-injected-metadata-browser` package. After resolving the
conflicts I missed a few tsconfig files which included references to the
now removed package. The TS Project Linter identified that these
references were removed from the code and pushed a change to the PR to
remove them from the tsconfig.json files.

## No bazel? Does that mean no packages??
Nope! We're still doing packages but we're pretty sure now that we won't
be using Bazel to accomplish the 'distributed caching' and 'change-based
tasks' portions of the packages project.

This PR actually makes packages much easier to work with and will be
followed up with the bundling benefits described by the original
packages RFC. Then we'll work on documentation and advocacy for using
packages for any and all new code.

We're pretty confident that implementing distributed caching and
change-based tasks will be necessary in the future, but because of
recent improvements in the repo we think we can live without them for
**at least** a year.

## Wait, there are still BUILD.bazel files in the repo
Yes, there are still three webpack bundles which are built by Bazel: the
`@kbn/ui-shared-deps-npm` DLL, `@kbn/ui-shared-deps-src` externals, and
the `@kbn/monaco` workers. These three webpack bundles are still created
during bootstrap and remotely cached using bazel. The next phase of this
project is to figure out how to get the package bundling features
described in the RFC with the current optimizer, and we expect these
bundles to go away then. Until then any package that is used in those
three bundles still needs to have a BUILD.bazel file so that they can be
referenced by the remaining webpack builds.

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
2022-12-22 19:00:29 -06:00
Dzmitry Lemechko
9f7db8f615
[scalability testing] typescript runner (#147002)
## Summary

Closes #146546

This PR replaces bash script with node-based runner script.

Script can take relative path to directory with scalability journey
files or relative path to individual journey json file.

`node scripts/run_scalability.js --journey-config-path
scalability_traces/server`

`node scripts/run_scalability.js --journey-config-path
scalability_traces/server/api.core.capabilities.json`

### Checklist

Delete any items that are not applicable to this PR.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)


### Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to
identify risks that should be tested prior to the change/feature
release.

When forming the risk matrix, consider some of the following examples
and how they may potentially impact the change:

| Risk | Probability | Severity | Mitigation/Notes |

|---------------------------|-------------|----------|-------------------------|
| Multiple Spaces&mdash;unexpected behavior in non-default Kibana Space.
| Low | High | Integration tests will verify that all features are still
supported in non-default Kibana Space and when user switches between
spaces. |
| Multiple nodes&mdash;Elasticsearch polling might have race conditions
when multiple Kibana nodes are polling for the same tasks. | High | Low
| Tasks are idempotent, so executing them multiple times will not result
in logical error, but will degrade performance. To test for this case we
add plenty of unit tests around this logic and document manual testing
procedure. |
| Code should gracefully handle cases when feature X or plugin Y are
disabled. | Medium | High | Unit tests will verify that any feature flag
or plugin combination still results in our service operational. |
| [See more potential risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) |


### For maintainers

- [ ] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
2022-12-06 22:21:52 +01:00
Spencer
50b3b57d9e
[ftr] add first-class support for playwrite journeys (#140680)
* [ftr] add first-class support for playwrite journeys

* [CI] Auto-commit changed files from 'node scripts/generate codeowners'

* fix jest test

* remove ability to customize kibana server args, if we need it we can add it back

* remove dev dir that doesn't exist

* fix typo

* prevent duplicated array converstion logic by sharing flag reader

* remove destructuring of option

* fix scalability config and config_path import

* fix start_servers args and tests

* include simple readme

* fix jest tests and support build re-use when changes are just to jest tests

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
2022-09-22 01:06:46 -07:00