Update perf docs (#147533)

This PR updates docs around Kibana performance effort: - how to create single user performance journeys, custom metrics with EBT and review test results - how to create api capacity test and where to find its test results
2025-06-27 10:40:07 -04:00 · 2023-01-18 17:56:05 +01:00 · 2023-01-18 17:56:05 +01:00 · 363f4b7583
commit 363f4b7583
parent 5854bceb62
3 changed files with 246 additions and 54 deletions
--- a/dev_docs/tutorials/performance/adding_api_capacity_test.mdx
+++ b/dev_docs/tutorials/performance/adding_api_capacity_test.mdx
@ -0,0 +1,134 @@
+---
+id: kibDevTutorialAddingApiCapacityTestingJourney
+slug: /kibana-dev-docs/tutorial/performance/adding_api_capacity_testing_journey
+title: Adding Api Capacity Testing Journey
+summary: Learn how to add api capacity test
+date: 2023-01-13
+tags: ['kibana', 'onboarding', 'setup', 'performance', 'development', 'telemetry']
+---
+
+## Overview
+It is important to test individual API endpoint for the baseline performance, scalability, or breaking point. If an API doesn’t meet performance requirements, it is a bottleneck.
+This capacity tests track how response time changes while we slowly increase number of concurrent requests per second.
+While using similar load model, we are able to identify how many requests per second each endpoint can hold with response time staying below critical threshold.
+
+Capacity API test defines 3 response time thresholds (default ones: 3000, 6000, 12000) in ms. Test results report rps (requests per second) for each threshold.
+
+Test results are reported using EBT in the following format:
+```json
+{
+  "_index": "backing-kibana-server-scalability-metrics-000003",
+  "_source": {
+    "eventType": "scalability_metric",
+    "journeyName": "GET /internal/security/me",
+    "ciBuildId": "0185aace-821d-42af-97c7-5b2b029f94df",
+    "responseTimeMetric": "85%",
+    "kibanaVersion": "8.7.0",
+    "threshold1ResponseTime": 3000,
+    "rpsAtThreshold1": 586,
+    "threshold2ResponseTime": 6000,
+    "rpsAtThreshold2": 601,
+    "threshold3ResponseTime": 12000,
+    "rpsAtThreshold3": 705,
+    "warmupAvgResponseTime": 34,
+    ...
+  }
+}
+```
+
+### Adding a new test
+Create a new json file in `x-pack/test/scalability/apis` with required properties:
+- **journeyName** is a test name, e.g. `GET /internal/security/session`
+- **scalabilitySetup** is used to set load model
+- **testData** is used to populate Elasticsearch and Kibana wth test data
+- **streams: [ {requests: [] }]** defines the API endpoint(s) to be called
+
+`scalabilitySetup` includes warmup and test phases.
+Warmup phase simulates 10 concurrent requests during 30s period and is important to get consistent results in test phase.
+Test phase simulates increasing concurrent requests from `minUsersCount` to `maxUsersCount` within `duration` time.
+Both `maxUsersCount` and `duration` in test phase should be adjusted for individual endpoint:
+  - `maxUsersCount` should be reasonable and enough to reach endpoint limits
+  - `duration` should be long enough to ramp up requests with low pace (1-2 requests per second)
+
+Example:
+```json
+{
+  "journeyName": "GET /internal/security/session", 
+  "scalabilitySetup": {
+    "warmup": [
+      {
+        "action": "constantUsersPerSec",
+        "userCount": 10,
+        "duration": "30s"
+      }
+    ],
+    "test": [
+      {
+        "action": "rampUsersPerSec",
+        "minUsersCount": 10,
+        "maxUsersCount": 700,
+        "duration": "345s"
+      }
+    ],
+    "maxDuration": "8m"
+  },
+  "testData": {
+    "esArchives": [],
+    "kbnArchives": []
+  },
+  "streams": [
+    {
+      "requests": [
+        {
+          "http": {
+            "method": "GET",
+            "path": "/internal/security/session",
+            "headers": {
+              "Cookie": "",
+              "Kbn-Version": "",
+              "Accept-Encoding": "gzip, deflate, br",
+              "Content-Type": "application/json"
+            },
+            "statusCode": 200
+          }
+        }
+      ]
+    }
+  ]
+}
+```
+
+Override default response time thresholds by adding to `scalabilitySetup`:
+```json
+  "responseTimeThreshold": {
+    "threshold1": 1000,
+    "threshold2": 2000,
+    "threshold3": 5000
+  },
+```
+
+### Running api capacity journey locally
+Clone [kibana-load-testing](https://github.com/elastic/kibana-load-testing) repo.
+
+Use the Node script from kibana root directory:
+  `node scripts/run_scalability_cli.js --journey-path x-pack/test/scalability/apis/$YOUR_JOURNEY_NAME.ts`
+
+Use `--kibana-install-dir` flag to test build
+
+### Benchmarking performance on CI
+In order to keep track on performance metrics stability, api capacity tests are run on main branch with a scheduled interval.
+Bare metal machine is used to produce results as stable and reproducible as possible.
+
+#### Machine specifications
+
+All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
+
+CPU: Intel® Core™ i9-12900K 16 cores
+RAM: 128 GB
+SSD: 1.92 TB Data center Gen4 NVMe
+
+#### Track performance results
+APM metrics are reported to [kibana-stats](https://kibana-stats.elastic.dev/) cluster.
+You can filter transactions using labels, e.g. `labels.journeyName : "GET /internal/security/session"`
+
+Custom metrics reported with EBT are available in [Telemetry Staging](https://telemetry-v2-staging.elastic.dev/) cluster, `kibana-performance` space.
--- a/dev_docs/tutorials/performance/adding_custom_performance_metrics.mdx
+++ b/dev_docs/tutorials/performance/adding_custom_performance_metrics.mdx
@ -1,34 +1,43 @@
 ---
-id: kibDevTutorialAddingPerformanceMetrics
-slug: /kibana-dev-docs/tutorial/adding_performance_metrics
+id: kibDevTutorialAddingCustomPerformanceMetrics
+slug: /kibana-dev-docs/tutorial/performance/adding_custom_performance_metrics
 title: Adding Performance Metrics
 summary: Learn how to instrument your code and analyze performance
-date: 2022-07-07
+date: 2023-01-13
 tags: ['kibana', 'onboarding', 'setup', 'performance', 'development', 'telemetry']
 ---

-## Reporting performance events
+# Build and track custom performance metrics
+Having access to performance metrics allows us to better understand user experience across Kibana, identify issues and fix it.
+Custom metrics allows to monitor critical flows like server start, saved objects fetching or dashboard loading times.

-### Simple performance events
+## Instrument your code to report metric event.
+We use event-based telemetry (EBT) to report client-side metrics as events.
+If you want to add a custom metric on server side, please notify the #kibana-core team in advance.

 Let's assume we intend to report the performance of a specific action called `APP_ACTION`.
-In order to do so, we need to first measure the timing of that action.
-Once we have the time measurement, we can use the `reportPerformanceMetricEvent` API to report it.
+In order to do so, we need to first measure the timing of that action. The [`performance.now()`](https://developer.mozilla.org/en-US/docs/Web/API/Performance/now) API can help with that:

-The most basic form of reporting would be:
+```typescript
+const actionStartTime = performance.now();
+// action is started and finished
+const actionDuration = window.performance.now() - actionStartTime; // Duration in milliseconds
+```
+
+Once we have the time measurement, we can use the `reportPerformanceMetricEvent` API to report it.

 ```typescript
 reportPerformanceMetricEvent(analytics, {
  eventName: APP_ACTION,     
-  duration,                  // Duration in milliseconds
+  duration: actionDuration,
 });
 ```

-Once executed, the metric would be delivered to the `stack-telemetry` cluster, alongside with the event's context.
+After the journey run is finished, the metric will be delivered to the [Telemetry Staging](https://telemetry-v2-staging.elastic.dev/) cluster, alongside with the event's context.
 The data is updated periodically, so you might have to wait up to 30 minutes to see your data in the index.

 Once indexed, this metric will appear in `ebt-kibana` index. It is also mapped into an additional index, dedicated to performance metrics.
-We recommend using the `Kibana Peformance` space on the telemetry cluster, where you get an `index patten` to easily access this data.
+We recommend using the `Kibana Performance` space on the telemetry cluster, where you get an `index patten` to easily access this data.
 Each document in the index has the following structure:

 ```typescript
@ -64,7 +73,7 @@ Lets assume we are interested in benchmarking the performance of a more complex
    - If data needs to be refreshed, it proceeds with a flow `load-data-from-api`.
 - `PROCESS_DATA` loads and processes the data depending on the flow chosen in the previous step.

-We could utilise the additional options supported by the `reportPerformanceMetricEvent` API:
+We could utilize the additional options supported by the `reportPerformanceMetricEvent` API:

 ```typescript
 import { reportPerformanceMetricEvent } from '@kbn/ebt-tools';
@ -136,8 +145,7 @@ creating an event for cpuUsage does not bring any value because it doesn't bring
 events in different places of code will have so much variability during performance analysis of your code. However it can be nice attribute
 to follow if it's important for you to look inside of a specific event e.g. `page-load`. 

- Understand your events
-  **Make sure that the event is clearly defined and consistent** (i.e. same code flow is executed each time).
+- **Make sure that the event is clearly defined and consistent** (i.e. same code flow is executed each time).
  Consider the start point and endpoint of the measurement and what happens between those points.
  For example: a `app-data-load` event should not include the time it takes to render the data.
 - **Choose event names wisely**.
@ -159,54 +167,19 @@ to follow if it's important for you to look inside of a specific event e.g. `pag
 - **Keep performance in mind**. Reporting the performance of Kibana should never harm its own performance. 
  Avoid sending events too frequently (`onMouseMove`) or adding serialized JSON objects (whole `SavedObjects`) into the meta object.

-### Benchmarking performance on CI
-
-One of the use cases for event based telemetry is benchmarking the performance of features over time.
-In order to keep track of their stability, the #kibana-performance team has developed a special set of
-functional tests called `Journeys`. These journeys execute a UI workflow and allow the telemetry to be
-reported to a cluster where it can then be analysed.
-
-Those journeys run on the key branches (main, release versions) on dedicated machines to produce results 
-as stable and reproducible as possible.
-
-#### Machine specifications
-
-All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
-
-CPU: Intel® Core™ i9-12900K
-RAM: 128 GB
-SSD: 1.92 TB Datacenter Gen4 NVMe 
-
-Since the tests are run on a local machine, there is also realistic throttling applied to the network to 
-simulate real life internet connection. This means that all requests have a [fixed latency and limited bandwidth](https://github.com/elastic/kibana/blob/main/x-pack/test/performance/services/performance.ts#L157).
-
-#### Journey implementation
-
-If you would like to keep track of the stability of your events, implement a journey by adding a functional
-test to the `x-pack/test/performance/journeys` folder.
-
-The telemetry reported during the execution of those journeys will be reported to the `telemetry-v2-staging` cluster
-alongside with execution context. Use the `context.labels.ciBuildName` label to filter down events to only those originating
-from performance runs and visualize the duration of events (or their breakdowns).
-
-Run the test locally for troubleshooting purposes by running 
-
-```
-node scripts/functional_tests --config x-pack/performance/journeys/$YOUR_JOURNEY_NAME.ts
-```
-
-#### Analyzing journey results
-
+### Analyzing journey results
+The telemetry data will be reported to the Telemetry Staging cluster alongside with execution context.
+Use the `context.labels.ciBuildName` label to filter down events to only those originating from performance runs and visualize the duration of events (or their breakdowns):
 - Be sure to narrow your analysis down to performance events by specifying a filter `context.labels.ciBuildName: kibana-single-user-performance`.
   Otherwise you might be looking at results originating from different hardware.
 - You can look at the results of a specific journey by filtering on `context.labels.journeyName`.

-Please contact the #kibana-performance team if you need more help visualising and tracking the results.
+Please contact the #kibana-performance team if you need more help visualizing and tracking the results.

 ### Production performance tracking

 All users who are opted in to report telemetry will start reporting event based telemetry as well.
-The data is available to be analysed on the production telemetry cluster.
+The data is available to be analyzed on the production telemetry cluster.

 # Analytics Client

--- a/dev_docs/tutorials/performance/adding_performance_journey.mdx
+++ b/dev_docs/tutorials/performance/adding_performance_journey.mdx
@ -0,0 +1,85 @@
+---
+id: kibDevTutorialAddingPerformanceJourney
+slug: /kibana-dev-docs/tutorial/performance/adding_performance_journey
+title: Adding Single User Performance Journey
+summary: Learn how to add journey and track Kibana performance
+date: 2023-01-13
+tags: ['kibana', 'onboarding', 'setup', 'performance', 'development']
+---
+
+## Overview
+In order to achieve our goal of creating best user experience in Kibana, it is important to keep track on its features performance.
+To make things easier, we introduced performance journeys, that mimics end-user experience with Kibana.
+
+Journey runs a flow of user interactions with Kibana in a browser and collects APM metrics for both server and client-side.
+It is possible to instrument Kibana with [custom performance metrics](https://docs.elastic.dev/kibana-dev-docs/tutorials/performance/adding_custom_performance_metrics),
+that will provide more detailed information about feature performance.
+
+Journeys core is [kbn-journeys](packages/kbn-journeys/README.mdx) package. It is a function test by design and is powered
+by [Playwright](https://playwright.dev/) end-to-end testing tool.
+
+### Adding a new performance journey
+Let's assume we instrumented dashboard with load time metrics and want to track sample data flights dashboard performance.
+Journey supports loading test data with esArchiver or kbnArchiver. Similar to functional tests, it might require to implement custom wait
+for UI rendering to be completed.
+
+Simply create a new file in `x-pack/performance/journeys` with the following code:
+
+```
+export const journey = new Journey({
+  esArchives: ['x-pack/performance/es_archives/sample_data_flights'],
+  kbnArchives: ['x-pack/performance/kbn_archives/flights_no_map_dashboard'],
+})
+
+  .step('Go to Dashboards Page', async ({ page, kbnUrl }) => {
+    await page.goto(kbnUrl.get(`/app/dashboards`));
+    await page.waitForSelector('#dashboardListingHeading');
+  })
+
+  .step('Go to Flights Dashboard', async ({ page, log }) => {
+    await page.click(subj('dashboardListingTitleLink-[Flights]-Global-Flight-Dashboard'));
+    await waitForVisualizations(page, log, 14);
+  });
+```
+
+In oder to get correct and consistent metrics, it is important to design journey properly:
+- use archives to generate test data
+- decouple complex scenarios into multiple simple journeys
+- use waiting for page loading / UI component rendering
+- test locally and check if journey is stable.
+- make sure performance metrics are collected on every run.
+
+### Running performance journey locally for troubleshooting purposes
+Use the Node script:
+  `node scripts/run_performance.js --journey-path x-pack/performance/journeys/$YOUR_JOURNEY_NAME.ts`
+
+Scripts steps include:
+- start Elasticsearch
+- start Kibana and run journey first time (warmup) only APM metrics being reported
+- start Kibana and run journey second time (test): both EBT and APM metrics being reported
+- stop Elasticsearch
+
+You can skip warmup phase for debug purpose by using `--skip-warmup` flag
+
+Since the tests are run on a local machine, there is also realistic throttling applied to the network to 
+simulate real life internet connection. This means that all requests have a fixed latency and limited bandwidth.
+
+### Benchmarking performance on CI
+In order to keep track on performance metrics stability, journeys are run on main branch with a scheduled interval.
+Bare metal machine is used to produce results as stable and reproducible as possible.
+
+#### Machine specifications
+
+All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
+
+CPU: Intel® Core™ i9-9900K 8 cores
+RAM: 128 GB
+SSD: 1.92 TB Data center Gen4 NVMe
+
+#### Track performance results
+APM metrics are reported to [kibana-ops-e2e-perf](https://kibana-ops-e2e-perf.kb.us-central1.gcp.cloud.es.io/) cluster.
+You can filter transactions using labels, e.g. `labels.journeyName : "flight_dashboard"`
+
+Custom metrics reported with EBT are available in [Telemetry Staging](https://telemetry-v2-staging.elastic.dev/) cluster, `kibana-performance` space.
+
+