Update perf docs (#147533)

This PR updates docs around Kibana performance effort:

- how to create single user performance journeys, custom metrics with
EBT and review test results
- how to create api capacity test and where to find its test results
This commit is contained in:
Dzmitry Lemechko 2023-01-18 17:56:05 +01:00 committed by GitHub
parent 5854bceb62
commit 363f4b7583
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
3 changed files with 246 additions and 54 deletions

View file

@ -0,0 +1,134 @@
---
id: kibDevTutorialAddingApiCapacityTestingJourney
slug: /kibana-dev-docs/tutorial/performance/adding_api_capacity_testing_journey
title: Adding Api Capacity Testing Journey
summary: Learn how to add api capacity test
date: 2023-01-13
tags: ['kibana', 'onboarding', 'setup', 'performance', 'development', 'telemetry']
---
## Overview
It is important to test individual API endpoint for the baseline performance, scalability, or breaking point. If an API doesnt meet performance requirements, it is a bottleneck.
This capacity tests track how response time changes while we slowly increase number of concurrent requests per second.
While using similar load model, we are able to identify how many requests per second each endpoint can hold with response time staying below critical threshold.
Capacity API test defines 3 response time thresholds (default ones: 3000, 6000, 12000) in ms. Test results report rps (requests per second) for each threshold.
Test results are reported using EBT in the following format:
```json
{
"_index": "backing-kibana-server-scalability-metrics-000003",
"_source": {
"eventType": "scalability_metric",
"journeyName": "GET /internal/security/me",
"ciBuildId": "0185aace-821d-42af-97c7-5b2b029f94df",
"responseTimeMetric": "85%",
"kibanaVersion": "8.7.0",
"threshold1ResponseTime": 3000,
"rpsAtThreshold1": 586,
"threshold2ResponseTime": 6000,
"rpsAtThreshold2": 601,
"threshold3ResponseTime": 12000,
"rpsAtThreshold3": 705,
"warmupAvgResponseTime": 34,
...
}
}
```
### Adding a new test
Create a new json file in `x-pack/test/scalability/apis` with required properties:
- **journeyName** is a test name, e.g. `GET /internal/security/session`
- **scalabilitySetup** is used to set load model
- **testData** is used to populate Elasticsearch and Kibana wth test data
- **streams: [ {requests: [] }]** defines the API endpoint(s) to be called
`scalabilitySetup` includes warmup and test phases.
Warmup phase simulates 10 concurrent requests during 30s period and is important to get consistent results in test phase.
Test phase simulates increasing concurrent requests from `minUsersCount` to `maxUsersCount` within `duration` time.
Both `maxUsersCount` and `duration` in test phase should be adjusted for individual endpoint:
- `maxUsersCount` should be reasonable and enough to reach endpoint limits
- `duration` should be long enough to ramp up requests with low pace (1-2 requests per second)
Example:
```json
{
"journeyName": "GET /internal/security/session",
"scalabilitySetup": {
"warmup": [
{
"action": "constantUsersPerSec",
"userCount": 10,
"duration": "30s"
}
],
"test": [
{
"action": "rampUsersPerSec",
"minUsersCount": 10,
"maxUsersCount": 700,
"duration": "345s"
}
],
"maxDuration": "8m"
},
"testData": {
"esArchives": [],
"kbnArchives": []
},
"streams": [
{
"requests": [
{
"http": {
"method": "GET",
"path": "/internal/security/session",
"headers": {
"Cookie": "",
"Kbn-Version": "",
"Accept-Encoding": "gzip, deflate, br",
"Content-Type": "application/json"
},
"statusCode": 200
}
}
]
}
]
}
```
Override default response time thresholds by adding to `scalabilitySetup`:
```json
"responseTimeThreshold": {
"threshold1": 1000,
"threshold2": 2000,
"threshold3": 5000
},
```
### Running api capacity journey locally
Clone [kibana-load-testing](https://github.com/elastic/kibana-load-testing) repo.
Use the Node script from kibana root directory:
`node scripts/run_scalability_cli.js --journey-path x-pack/test/scalability/apis/$YOUR_JOURNEY_NAME.ts`
Use `--kibana-install-dir` flag to test build
### Benchmarking performance on CI
In order to keep track on performance metrics stability, api capacity tests are run on main branch with a scheduled interval.
Bare metal machine is used to produce results as stable and reproducible as possible.
#### Machine specifications
All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
CPU: Intel® Core™ i9-12900K 16 cores
RAM: 128 GB
SSD: 1.92 TB Data center Gen4 NVMe
#### Track performance results
APM metrics are reported to [kibana-stats](https://kibana-stats.elastic.dev/) cluster.
You can filter transactions using labels, e.g. `labels.journeyName : "GET /internal/security/session"`
Custom metrics reported with EBT are available in [Telemetry Staging](https://telemetry-v2-staging.elastic.dev/) cluster, `kibana-performance` space.

View file

@ -1,34 +1,43 @@
---
id: kibDevTutorialAddingPerformanceMetrics
slug: /kibana-dev-docs/tutorial/adding_performance_metrics
id: kibDevTutorialAddingCustomPerformanceMetrics
slug: /kibana-dev-docs/tutorial/performance/adding_custom_performance_metrics
title: Adding Performance Metrics
summary: Learn how to instrument your code and analyze performance
date: 2022-07-07
date: 2023-01-13
tags: ['kibana', 'onboarding', 'setup', 'performance', 'development', 'telemetry']
---
## Reporting performance events
# Build and track custom performance metrics
Having access to performance metrics allows us to better understand user experience across Kibana, identify issues and fix it.
Custom metrics allows to monitor critical flows like server start, saved objects fetching or dashboard loading times.
### Simple performance events
## Instrument your code to report metric event.
We use event-based telemetry (EBT) to report client-side metrics as events.
If you want to add a custom metric on server side, please notify the #kibana-core team in advance.
Let's assume we intend to report the performance of a specific action called `APP_ACTION`.
In order to do so, we need to first measure the timing of that action.
Once we have the time measurement, we can use the `reportPerformanceMetricEvent` API to report it.
In order to do so, we need to first measure the timing of that action. The [`performance.now()`](https://developer.mozilla.org/en-US/docs/Web/API/Performance/now) API can help with that:
The most basic form of reporting would be:
```typescript
const actionStartTime = performance.now();
// action is started and finished
const actionDuration = window.performance.now() - actionStartTime; // Duration in milliseconds
```
Once we have the time measurement, we can use the `reportPerformanceMetricEvent` API to report it.
```typescript
reportPerformanceMetricEvent(analytics, {
eventName: APP_ACTION,
duration, // Duration in milliseconds
duration: actionDuration,
});
```
Once executed, the metric would be delivered to the `stack-telemetry` cluster, alongside with the event's context.
After the journey run is finished, the metric will be delivered to the [Telemetry Staging](https://telemetry-v2-staging.elastic.dev/) cluster, alongside with the event's context.
The data is updated periodically, so you might have to wait up to 30 minutes to see your data in the index.
Once indexed, this metric will appear in `ebt-kibana` index. It is also mapped into an additional index, dedicated to performance metrics.
We recommend using the `Kibana Peformance` space on the telemetry cluster, where you get an `index patten` to easily access this data.
We recommend using the `Kibana Performance` space on the telemetry cluster, where you get an `index patten` to easily access this data.
Each document in the index has the following structure:
```typescript
@ -64,7 +73,7 @@ Lets assume we are interested in benchmarking the performance of a more complex
- If data needs to be refreshed, it proceeds with a flow `load-data-from-api`.
- `PROCESS_DATA` loads and processes the data depending on the flow chosen in the previous step.
We could utilise the additional options supported by the `reportPerformanceMetricEvent` API:
We could utilize the additional options supported by the `reportPerformanceMetricEvent` API:
```typescript
import { reportPerformanceMetricEvent } from '@kbn/ebt-tools';
@ -136,8 +145,7 @@ creating an event for cpuUsage does not bring any value because it doesn't bring
events in different places of code will have so much variability during performance analysis of your code. However it can be nice attribute
to follow if it's important for you to look inside of a specific event e.g. `page-load`.
- Understand your events
**Make sure that the event is clearly defined and consistent** (i.e. same code flow is executed each time).
- **Make sure that the event is clearly defined and consistent** (i.e. same code flow is executed each time).
Consider the start point and endpoint of the measurement and what happens between those points.
For example: a `app-data-load` event should not include the time it takes to render the data.
- **Choose event names wisely**.
@ -159,54 +167,19 @@ to follow if it's important for you to look inside of a specific event e.g. `pag
- **Keep performance in mind**. Reporting the performance of Kibana should never harm its own performance.
Avoid sending events too frequently (`onMouseMove`) or adding serialized JSON objects (whole `SavedObjects`) into the meta object.
### Benchmarking performance on CI
One of the use cases for event based telemetry is benchmarking the performance of features over time.
In order to keep track of their stability, the #kibana-performance team has developed a special set of
functional tests called `Journeys`. These journeys execute a UI workflow and allow the telemetry to be
reported to a cluster where it can then be analysed.
Those journeys run on the key branches (main, release versions) on dedicated machines to produce results
as stable and reproducible as possible.
#### Machine specifications
All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
CPU: Intel® Core™ i9-12900K
RAM: 128 GB
SSD: 1.92 TB Datacenter Gen4 NVMe
Since the tests are run on a local machine, there is also realistic throttling applied to the network to
simulate real life internet connection. This means that all requests have a [fixed latency and limited bandwidth](https://github.com/elastic/kibana/blob/main/x-pack/test/performance/services/performance.ts#L157).
#### Journey implementation
If you would like to keep track of the stability of your events, implement a journey by adding a functional
test to the `x-pack/test/performance/journeys` folder.
The telemetry reported during the execution of those journeys will be reported to the `telemetry-v2-staging` cluster
alongside with execution context. Use the `context.labels.ciBuildName` label to filter down events to only those originating
from performance runs and visualize the duration of events (or their breakdowns).
Run the test locally for troubleshooting purposes by running
```
node scripts/functional_tests --config x-pack/performance/journeys/$YOUR_JOURNEY_NAME.ts
```
#### Analyzing journey results
### Analyzing journey results
The telemetry data will be reported to the Telemetry Staging cluster alongside with execution context.
Use the `context.labels.ciBuildName` label to filter down events to only those originating from performance runs and visualize the duration of events (or their breakdowns):
- Be sure to narrow your analysis down to performance events by specifying a filter `context.labels.ciBuildName: kibana-single-user-performance`.
Otherwise you might be looking at results originating from different hardware.
- You can look at the results of a specific journey by filtering on `context.labels.journeyName`.
Please contact the #kibana-performance team if you need more help visualising and tracking the results.
Please contact the #kibana-performance team if you need more help visualizing and tracking the results.
### Production performance tracking
All users who are opted in to report telemetry will start reporting event based telemetry as well.
The data is available to be analysed on the production telemetry cluster.
The data is available to be analyzed on the production telemetry cluster.
# Analytics Client

View file

@ -0,0 +1,85 @@
---
id: kibDevTutorialAddingPerformanceJourney
slug: /kibana-dev-docs/tutorial/performance/adding_performance_journey
title: Adding Single User Performance Journey
summary: Learn how to add journey and track Kibana performance
date: 2023-01-13
tags: ['kibana', 'onboarding', 'setup', 'performance', 'development']
---
## Overview
In order to achieve our goal of creating best user experience in Kibana, it is important to keep track on its features performance.
To make things easier, we introduced performance journeys, that mimics end-user experience with Kibana.
Journey runs a flow of user interactions with Kibana in a browser and collects APM metrics for both server and client-side.
It is possible to instrument Kibana with [custom performance metrics](https://docs.elastic.dev/kibana-dev-docs/tutorials/performance/adding_custom_performance_metrics),
that will provide more detailed information about feature performance.
Journeys core is [kbn-journeys](packages/kbn-journeys/README.mdx) package. It is a function test by design and is powered
by [Playwright](https://playwright.dev/) end-to-end testing tool.
### Adding a new performance journey
Let's assume we instrumented dashboard with load time metrics and want to track sample data flights dashboard performance.
Journey supports loading test data with esArchiver or kbnArchiver. Similar to functional tests, it might require to implement custom wait
for UI rendering to be completed.
Simply create a new file in `x-pack/performance/journeys` with the following code:
```
export const journey = new Journey({
esArchives: ['x-pack/performance/es_archives/sample_data_flights'],
kbnArchives: ['x-pack/performance/kbn_archives/flights_no_map_dashboard'],
})
.step('Go to Dashboards Page', async ({ page, kbnUrl }) => {
await page.goto(kbnUrl.get(`/app/dashboards`));
await page.waitForSelector('#dashboardListingHeading');
})
.step('Go to Flights Dashboard', async ({ page, log }) => {
await page.click(subj('dashboardListingTitleLink-[Flights]-Global-Flight-Dashboard'));
await waitForVisualizations(page, log, 14);
});
```
In oder to get correct and consistent metrics, it is important to design journey properly:
- use archives to generate test data
- decouple complex scenarios into multiple simple journeys
- use waiting for page loading / UI component rendering
- test locally and check if journey is stable.
- make sure performance metrics are collected on every run.
### Running performance journey locally for troubleshooting purposes
Use the Node script:
`node scripts/run_performance.js --journey-path x-pack/performance/journeys/$YOUR_JOURNEY_NAME.ts`
Scripts steps include:
- start Elasticsearch
- start Kibana and run journey first time (warmup) only APM metrics being reported
- start Kibana and run journey second time (test): both EBT and APM metrics being reported
- stop Elasticsearch
You can skip warmup phase for debug purpose by using `--skip-warmup` flag
Since the tests are run on a local machine, there is also realistic throttling applied to the network to
simulate real life internet connection. This means that all requests have a fixed latency and limited bandwidth.
### Benchmarking performance on CI
In order to keep track on performance metrics stability, journeys are run on main branch with a scheduled interval.
Bare metal machine is used to produce results as stable and reproducible as possible.
#### Machine specifications
All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
CPU: Intel® Core™ i9-9900K 8 cores
RAM: 128 GB
SSD: 1.92 TB Data center Gen4 NVMe
#### Track performance results
APM metrics are reported to [kibana-ops-e2e-perf](https://kibana-ops-e2e-perf.kb.us-central1.gcp.cloud.es.io/) cluster.
You can filter transactions using labels, e.g. `labels.journeyName : "flight_dashboard"`
Custom metrics reported with EBT are available in [Telemetry Staging](https://telemetry-v2-staging.elastic.dev/) cluster, `kibana-performance` space.