mirror of
https://github.com/elastic/kibana.git
synced 2025-04-23 17:28:26 -04:00
Update perf docs (#147533)
This PR updates docs around Kibana performance effort: - how to create single user performance journeys, custom metrics with EBT and review test results - how to create api capacity test and where to find its test results
This commit is contained in:
parent
5854bceb62
commit
363f4b7583
3 changed files with 246 additions and 54 deletions
134
dev_docs/tutorials/performance/adding_api_capacity_test.mdx
Normal file
134
dev_docs/tutorials/performance/adding_api_capacity_test.mdx
Normal file
|
@ -0,0 +1,134 @@
|
|||
---
|
||||
id: kibDevTutorialAddingApiCapacityTestingJourney
|
||||
slug: /kibana-dev-docs/tutorial/performance/adding_api_capacity_testing_journey
|
||||
title: Adding Api Capacity Testing Journey
|
||||
summary: Learn how to add api capacity test
|
||||
date: 2023-01-13
|
||||
tags: ['kibana', 'onboarding', 'setup', 'performance', 'development', 'telemetry']
|
||||
---
|
||||
|
||||
## Overview
|
||||
It is important to test individual API endpoint for the baseline performance, scalability, or breaking point. If an API doesn’t meet performance requirements, it is a bottleneck.
|
||||
This capacity tests track how response time changes while we slowly increase number of concurrent requests per second.
|
||||
While using similar load model, we are able to identify how many requests per second each endpoint can hold with response time staying below critical threshold.
|
||||
|
||||
Capacity API test defines 3 response time thresholds (default ones: 3000, 6000, 12000) in ms. Test results report rps (requests per second) for each threshold.
|
||||
|
||||
Test results are reported using EBT in the following format:
|
||||
```json
|
||||
{
|
||||
"_index": "backing-kibana-server-scalability-metrics-000003",
|
||||
"_source": {
|
||||
"eventType": "scalability_metric",
|
||||
"journeyName": "GET /internal/security/me",
|
||||
"ciBuildId": "0185aace-821d-42af-97c7-5b2b029f94df",
|
||||
"responseTimeMetric": "85%",
|
||||
"kibanaVersion": "8.7.0",
|
||||
"threshold1ResponseTime": 3000,
|
||||
"rpsAtThreshold1": 586,
|
||||
"threshold2ResponseTime": 6000,
|
||||
"rpsAtThreshold2": 601,
|
||||
"threshold3ResponseTime": 12000,
|
||||
"rpsAtThreshold3": 705,
|
||||
"warmupAvgResponseTime": 34,
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Adding a new test
|
||||
Create a new json file in `x-pack/test/scalability/apis` with required properties:
|
||||
- **journeyName** is a test name, e.g. `GET /internal/security/session`
|
||||
- **scalabilitySetup** is used to set load model
|
||||
- **testData** is used to populate Elasticsearch and Kibana wth test data
|
||||
- **streams: [ {requests: [] }]** defines the API endpoint(s) to be called
|
||||
|
||||
`scalabilitySetup` includes warmup and test phases.
|
||||
Warmup phase simulates 10 concurrent requests during 30s period and is important to get consistent results in test phase.
|
||||
Test phase simulates increasing concurrent requests from `minUsersCount` to `maxUsersCount` within `duration` time.
|
||||
Both `maxUsersCount` and `duration` in test phase should be adjusted for individual endpoint:
|
||||
- `maxUsersCount` should be reasonable and enough to reach endpoint limits
|
||||
- `duration` should be long enough to ramp up requests with low pace (1-2 requests per second)
|
||||
|
||||
Example:
|
||||
```json
|
||||
{
|
||||
"journeyName": "GET /internal/security/session",
|
||||
"scalabilitySetup": {
|
||||
"warmup": [
|
||||
{
|
||||
"action": "constantUsersPerSec",
|
||||
"userCount": 10,
|
||||
"duration": "30s"
|
||||
}
|
||||
],
|
||||
"test": [
|
||||
{
|
||||
"action": "rampUsersPerSec",
|
||||
"minUsersCount": 10,
|
||||
"maxUsersCount": 700,
|
||||
"duration": "345s"
|
||||
}
|
||||
],
|
||||
"maxDuration": "8m"
|
||||
},
|
||||
"testData": {
|
||||
"esArchives": [],
|
||||
"kbnArchives": []
|
||||
},
|
||||
"streams": [
|
||||
{
|
||||
"requests": [
|
||||
{
|
||||
"http": {
|
||||
"method": "GET",
|
||||
"path": "/internal/security/session",
|
||||
"headers": {
|
||||
"Cookie": "",
|
||||
"Kbn-Version": "",
|
||||
"Accept-Encoding": "gzip, deflate, br",
|
||||
"Content-Type": "application/json"
|
||||
},
|
||||
"statusCode": 200
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Override default response time thresholds by adding to `scalabilitySetup`:
|
||||
```json
|
||||
"responseTimeThreshold": {
|
||||
"threshold1": 1000,
|
||||
"threshold2": 2000,
|
||||
"threshold3": 5000
|
||||
},
|
||||
```
|
||||
|
||||
### Running api capacity journey locally
|
||||
Clone [kibana-load-testing](https://github.com/elastic/kibana-load-testing) repo.
|
||||
|
||||
Use the Node script from kibana root directory:
|
||||
`node scripts/run_scalability_cli.js --journey-path x-pack/test/scalability/apis/$YOUR_JOURNEY_NAME.ts`
|
||||
|
||||
Use `--kibana-install-dir` flag to test build
|
||||
|
||||
### Benchmarking performance on CI
|
||||
In order to keep track on performance metrics stability, api capacity tests are run on main branch with a scheduled interval.
|
||||
Bare metal machine is used to produce results as stable and reproducible as possible.
|
||||
|
||||
#### Machine specifications
|
||||
|
||||
All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
|
||||
|
||||
CPU: Intel® Core™ i9-12900K 16 cores
|
||||
RAM: 128 GB
|
||||
SSD: 1.92 TB Data center Gen4 NVMe
|
||||
|
||||
#### Track performance results
|
||||
APM metrics are reported to [kibana-stats](https://kibana-stats.elastic.dev/) cluster.
|
||||
You can filter transactions using labels, e.g. `labels.journeyName : "GET /internal/security/session"`
|
||||
|
||||
Custom metrics reported with EBT are available in [Telemetry Staging](https://telemetry-v2-staging.elastic.dev/) cluster, `kibana-performance` space.
|
|
@ -1,34 +1,43 @@
|
|||
---
|
||||
id: kibDevTutorialAddingPerformanceMetrics
|
||||
slug: /kibana-dev-docs/tutorial/adding_performance_metrics
|
||||
id: kibDevTutorialAddingCustomPerformanceMetrics
|
||||
slug: /kibana-dev-docs/tutorial/performance/adding_custom_performance_metrics
|
||||
title: Adding Performance Metrics
|
||||
summary: Learn how to instrument your code and analyze performance
|
||||
date: 2022-07-07
|
||||
date: 2023-01-13
|
||||
tags: ['kibana', 'onboarding', 'setup', 'performance', 'development', 'telemetry']
|
||||
---
|
||||
|
||||
## Reporting performance events
|
||||
# Build and track custom performance metrics
|
||||
Having access to performance metrics allows us to better understand user experience across Kibana, identify issues and fix it.
|
||||
Custom metrics allows to monitor critical flows like server start, saved objects fetching or dashboard loading times.
|
||||
|
||||
### Simple performance events
|
||||
## Instrument your code to report metric event.
|
||||
We use event-based telemetry (EBT) to report client-side metrics as events.
|
||||
If you want to add a custom metric on server side, please notify the #kibana-core team in advance.
|
||||
|
||||
Let's assume we intend to report the performance of a specific action called `APP_ACTION`.
|
||||
In order to do so, we need to first measure the timing of that action.
|
||||
Once we have the time measurement, we can use the `reportPerformanceMetricEvent` API to report it.
|
||||
In order to do so, we need to first measure the timing of that action. The [`performance.now()`](https://developer.mozilla.org/en-US/docs/Web/API/Performance/now) API can help with that:
|
||||
|
||||
The most basic form of reporting would be:
|
||||
```typescript
|
||||
const actionStartTime = performance.now();
|
||||
// action is started and finished
|
||||
const actionDuration = window.performance.now() - actionStartTime; // Duration in milliseconds
|
||||
```
|
||||
|
||||
Once we have the time measurement, we can use the `reportPerformanceMetricEvent` API to report it.
|
||||
|
||||
```typescript
|
||||
reportPerformanceMetricEvent(analytics, {
|
||||
eventName: APP_ACTION,
|
||||
duration, // Duration in milliseconds
|
||||
duration: actionDuration,
|
||||
});
|
||||
```
|
||||
|
||||
Once executed, the metric would be delivered to the `stack-telemetry` cluster, alongside with the event's context.
|
||||
After the journey run is finished, the metric will be delivered to the [Telemetry Staging](https://telemetry-v2-staging.elastic.dev/) cluster, alongside with the event's context.
|
||||
The data is updated periodically, so you might have to wait up to 30 minutes to see your data in the index.
|
||||
|
||||
Once indexed, this metric will appear in `ebt-kibana` index. It is also mapped into an additional index, dedicated to performance metrics.
|
||||
We recommend using the `Kibana Peformance` space on the telemetry cluster, where you get an `index patten` to easily access this data.
|
||||
We recommend using the `Kibana Performance` space on the telemetry cluster, where you get an `index patten` to easily access this data.
|
||||
Each document in the index has the following structure:
|
||||
|
||||
```typescript
|
||||
|
@ -64,7 +73,7 @@ Lets assume we are interested in benchmarking the performance of a more complex
|
|||
- If data needs to be refreshed, it proceeds with a flow `load-data-from-api`.
|
||||
- `PROCESS_DATA` loads and processes the data depending on the flow chosen in the previous step.
|
||||
|
||||
We could utilise the additional options supported by the `reportPerformanceMetricEvent` API:
|
||||
We could utilize the additional options supported by the `reportPerformanceMetricEvent` API:
|
||||
|
||||
```typescript
|
||||
import { reportPerformanceMetricEvent } from '@kbn/ebt-tools';
|
||||
|
@ -136,8 +145,7 @@ creating an event for cpuUsage does not bring any value because it doesn't bring
|
|||
events in different places of code will have so much variability during performance analysis of your code. However it can be nice attribute
|
||||
to follow if it's important for you to look inside of a specific event e.g. `page-load`.
|
||||
|
||||
- Understand your events
|
||||
**Make sure that the event is clearly defined and consistent** (i.e. same code flow is executed each time).
|
||||
- **Make sure that the event is clearly defined and consistent** (i.e. same code flow is executed each time).
|
||||
Consider the start point and endpoint of the measurement and what happens between those points.
|
||||
For example: a `app-data-load` event should not include the time it takes to render the data.
|
||||
- **Choose event names wisely**.
|
||||
|
@ -159,54 +167,19 @@ to follow if it's important for you to look inside of a specific event e.g. `pag
|
|||
- **Keep performance in mind**. Reporting the performance of Kibana should never harm its own performance.
|
||||
Avoid sending events too frequently (`onMouseMove`) or adding serialized JSON objects (whole `SavedObjects`) into the meta object.
|
||||
|
||||
### Benchmarking performance on CI
|
||||
|
||||
One of the use cases for event based telemetry is benchmarking the performance of features over time.
|
||||
In order to keep track of their stability, the #kibana-performance team has developed a special set of
|
||||
functional tests called `Journeys`. These journeys execute a UI workflow and allow the telemetry to be
|
||||
reported to a cluster where it can then be analysed.
|
||||
|
||||
Those journeys run on the key branches (main, release versions) on dedicated machines to produce results
|
||||
as stable and reproducible as possible.
|
||||
|
||||
#### Machine specifications
|
||||
|
||||
All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
|
||||
|
||||
CPU: Intel® Core™ i9-12900K
|
||||
RAM: 128 GB
|
||||
SSD: 1.92 TB Datacenter Gen4 NVMe
|
||||
|
||||
Since the tests are run on a local machine, there is also realistic throttling applied to the network to
|
||||
simulate real life internet connection. This means that all requests have a [fixed latency and limited bandwidth](https://github.com/elastic/kibana/blob/main/x-pack/test/performance/services/performance.ts#L157).
|
||||
|
||||
#### Journey implementation
|
||||
|
||||
If you would like to keep track of the stability of your events, implement a journey by adding a functional
|
||||
test to the `x-pack/test/performance/journeys` folder.
|
||||
|
||||
The telemetry reported during the execution of those journeys will be reported to the `telemetry-v2-staging` cluster
|
||||
alongside with execution context. Use the `context.labels.ciBuildName` label to filter down events to only those originating
|
||||
from performance runs and visualize the duration of events (or their breakdowns).
|
||||
|
||||
Run the test locally for troubleshooting purposes by running
|
||||
|
||||
```
|
||||
node scripts/functional_tests --config x-pack/performance/journeys/$YOUR_JOURNEY_NAME.ts
|
||||
```
|
||||
|
||||
#### Analyzing journey results
|
||||
|
||||
### Analyzing journey results
|
||||
The telemetry data will be reported to the Telemetry Staging cluster alongside with execution context.
|
||||
Use the `context.labels.ciBuildName` label to filter down events to only those originating from performance runs and visualize the duration of events (or their breakdowns):
|
||||
- Be sure to narrow your analysis down to performance events by specifying a filter `context.labels.ciBuildName: kibana-single-user-performance`.
|
||||
Otherwise you might be looking at results originating from different hardware.
|
||||
- You can look at the results of a specific journey by filtering on `context.labels.journeyName`.
|
||||
|
||||
Please contact the #kibana-performance team if you need more help visualising and tracking the results.
|
||||
Please contact the #kibana-performance team if you need more help visualizing and tracking the results.
|
||||
|
||||
### Production performance tracking
|
||||
|
||||
All users who are opted in to report telemetry will start reporting event based telemetry as well.
|
||||
The data is available to be analysed on the production telemetry cluster.
|
||||
The data is available to be analyzed on the production telemetry cluster.
|
||||
|
||||
# Analytics Client
|
||||
|
|
@ -0,0 +1,85 @@
|
|||
---
|
||||
id: kibDevTutorialAddingPerformanceJourney
|
||||
slug: /kibana-dev-docs/tutorial/performance/adding_performance_journey
|
||||
title: Adding Single User Performance Journey
|
||||
summary: Learn how to add journey and track Kibana performance
|
||||
date: 2023-01-13
|
||||
tags: ['kibana', 'onboarding', 'setup', 'performance', 'development']
|
||||
---
|
||||
|
||||
## Overview
|
||||
In order to achieve our goal of creating best user experience in Kibana, it is important to keep track on its features performance.
|
||||
To make things easier, we introduced performance journeys, that mimics end-user experience with Kibana.
|
||||
|
||||
Journey runs a flow of user interactions with Kibana in a browser and collects APM metrics for both server and client-side.
|
||||
It is possible to instrument Kibana with [custom performance metrics](https://docs.elastic.dev/kibana-dev-docs/tutorials/performance/adding_custom_performance_metrics),
|
||||
that will provide more detailed information about feature performance.
|
||||
|
||||
Journeys core is [kbn-journeys](packages/kbn-journeys/README.mdx) package. It is a function test by design and is powered
|
||||
by [Playwright](https://playwright.dev/) end-to-end testing tool.
|
||||
|
||||
### Adding a new performance journey
|
||||
Let's assume we instrumented dashboard with load time metrics and want to track sample data flights dashboard performance.
|
||||
Journey supports loading test data with esArchiver or kbnArchiver. Similar to functional tests, it might require to implement custom wait
|
||||
for UI rendering to be completed.
|
||||
|
||||
Simply create a new file in `x-pack/performance/journeys` with the following code:
|
||||
|
||||
```
|
||||
export const journey = new Journey({
|
||||
esArchives: ['x-pack/performance/es_archives/sample_data_flights'],
|
||||
kbnArchives: ['x-pack/performance/kbn_archives/flights_no_map_dashboard'],
|
||||
})
|
||||
|
||||
.step('Go to Dashboards Page', async ({ page, kbnUrl }) => {
|
||||
await page.goto(kbnUrl.get(`/app/dashboards`));
|
||||
await page.waitForSelector('#dashboardListingHeading');
|
||||
})
|
||||
|
||||
.step('Go to Flights Dashboard', async ({ page, log }) => {
|
||||
await page.click(subj('dashboardListingTitleLink-[Flights]-Global-Flight-Dashboard'));
|
||||
await waitForVisualizations(page, log, 14);
|
||||
});
|
||||
```
|
||||
|
||||
In oder to get correct and consistent metrics, it is important to design journey properly:
|
||||
- use archives to generate test data
|
||||
- decouple complex scenarios into multiple simple journeys
|
||||
- use waiting for page loading / UI component rendering
|
||||
- test locally and check if journey is stable.
|
||||
- make sure performance metrics are collected on every run.
|
||||
|
||||
### Running performance journey locally for troubleshooting purposes
|
||||
Use the Node script:
|
||||
`node scripts/run_performance.js --journey-path x-pack/performance/journeys/$YOUR_JOURNEY_NAME.ts`
|
||||
|
||||
Scripts steps include:
|
||||
- start Elasticsearch
|
||||
- start Kibana and run journey first time (warmup) only APM metrics being reported
|
||||
- start Kibana and run journey second time (test): both EBT and APM metrics being reported
|
||||
- stop Elasticsearch
|
||||
|
||||
You can skip warmup phase for debug purpose by using `--skip-warmup` flag
|
||||
|
||||
Since the tests are run on a local machine, there is also realistic throttling applied to the network to
|
||||
simulate real life internet connection. This means that all requests have a fixed latency and limited bandwidth.
|
||||
|
||||
### Benchmarking performance on CI
|
||||
In order to keep track on performance metrics stability, journeys are run on main branch with a scheduled interval.
|
||||
Bare metal machine is used to produce results as stable and reproducible as possible.
|
||||
|
||||
#### Machine specifications
|
||||
|
||||
All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
|
||||
|
||||
CPU: Intel® Core™ i9-9900K 8 cores
|
||||
RAM: 128 GB
|
||||
SSD: 1.92 TB Data center Gen4 NVMe
|
||||
|
||||
#### Track performance results
|
||||
APM metrics are reported to [kibana-ops-e2e-perf](https://kibana-ops-e2e-perf.kb.us-central1.gcp.cloud.es.io/) cluster.
|
||||
You can filter transactions using labels, e.g. `labels.journeyName : "flight_dashboard"`
|
||||
|
||||
Custom metrics reported with EBT are available in [Telemetry Staging](https://telemetry-v2-staging.elastic.dev/) cluster, `kibana-performance` space.
|
||||
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue