mirror of
https://github.com/elastic/kibana.git
synced 2025-04-25 02:09:32 -04:00
214 lines
No EOL
12 KiB
Text
214 lines
No EOL
12 KiB
Text
---
|
|
id: kibDevTutorialAddingPerformanceMetrics
|
|
slug: /kibana-dev-docs/tutorial/adding_performance_metrics
|
|
title: Adding Performance Metrics
|
|
summary: Learn how to instrument your code and analyze performance
|
|
date: 2022-07-07
|
|
tags: ['kibana', 'onboarding', 'setup', 'performance', 'development', 'telemetry']
|
|
---
|
|
|
|
## Reporting performance events
|
|
|
|
### Simple performance events
|
|
|
|
Let's assume we intend to report the performance of a specific action called `APP_ACTION`.
|
|
In order to do so, we need to first measure the timing of that action.
|
|
Once we have the time measurement, we can use the `reportPerformanceMetricEvent` API to report it.
|
|
|
|
The most basic form of reporting would be:
|
|
|
|
```typescript
|
|
reportPerformanceMetricEvent(analytics, {
|
|
eventName: APP_ACTION,
|
|
duration, // Duration in milliseconds
|
|
});
|
|
```
|
|
|
|
Once executed, the metric would be delivered to the `stack-telemetry` cluster, alongside with the event's context.
|
|
The data is updated periodically, so you might have to wait up to 30 minutes to see your data in the index.
|
|
|
|
Once indexed, this metric will appear in `ebt-kibana` index. It is also mapped into an additional index, dedicated to performance metrics.
|
|
We recommend using the `Kibana Peformance` space on the telemetry cluster, where you get an `index patten` to easily access this data.
|
|
Each document in the index has the following structure:
|
|
|
|
```typescript
|
|
{
|
|
"_index": "backing-ebt-kibana-browser-performance-metrics-000001", // Performance metrics are stored at a dedicated simplified index (browser \ server).
|
|
"_source": {
|
|
"timestamp": "2022-08-31T11:29:58.275Z"
|
|
"event_type": "performance_metric", // All events share a common event type to simplify mapping
|
|
"eventName": APP_ACTION, // Event name as specified when reporting it
|
|
"duration": 736, // Event duration as specified when reporting it
|
|
"context": { // Context holds information identifying the deployment, version, application and page that generated the event
|
|
"version": "8.5.0-SNAPSHOT",
|
|
"applicationId": "dashboards",
|
|
"page": "app",
|
|
"entityId": "61c58ad0-3dd3-11e8-b2b9-5d5dc1715159",
|
|
"branch": "main",
|
|
"labels": {
|
|
"journeyName": "flight_dashboard",
|
|
...
|
|
}
|
|
...
|
|
},
|
|
...
|
|
},
|
|
}
|
|
```
|
|
|
|
### Performance events with breakdowns and metadata
|
|
|
|
Lets assume we are interested in benchmarking the performance of a more complex event `COMPLEX_APP_ACTION`, that is made up of two steps:
|
|
- `INSPECT_DATA` measures the time it takes to retrieve a user's profile and check if there is a cached version of their data.
|
|
- If the cached data is fresh it proceeds with a flow `use-local-data`
|
|
- If data needs to be refreshed, it proceeds with a flow `load-data-from-api`.
|
|
- `PROCESS_DATA` loads and processes the data depending on the flow chosen in the previous step.
|
|
|
|
We could utilise the additional options supported by the `reportPerformanceMetricEvent` API:
|
|
|
|
```typescript
|
|
import { reportPerformanceMetricEvent } from '@kbn/ebt-tools';
|
|
|
|
reportPerformanceMetricEvent(analytics, {
|
|
eventName: COMPLEX_APP_ACTION,
|
|
duration, // Total duration in milliseconds
|
|
key1 : INSPECT_DATA, // Claiming free key1 to be used for INSPECT_DATA
|
|
value1 : durationOfStepA, // Total duration of step INSPECT_DATA in milliseconds
|
|
key2 : PROCESS_DATA, // Claiming free key2 to be used for PROCESS_DATA
|
|
value2 : durationOfStepB, // Total duration of step PROCESS_DATA in milliseconds
|
|
meta: {
|
|
dataSource: 'flow2', // Providing event specific context. This can be useful to create meaningful aggregations.
|
|
},
|
|
});
|
|
```
|
|
|
|
This event will be indexed with the following structure:
|
|
|
|
```typescript
|
|
{
|
|
"_index": "backing-ebt-kibana-browser-performance-metrics-000001", // Performance metrics are stored in a dedicated simplified index (browser \ server).
|
|
"_source": {
|
|
"timestamp": "2022-08-31T11:29:58.275Z"
|
|
"event_type": "performance_metric", // All events share a common event type to simplify mapping
|
|
"eventName": COMPLEX_APP_ACTION, // Event name as specified when reporting it
|
|
"duration": 736, // Event duration as specified when reporting it
|
|
"key1": INSPECT_DATA, // The key name of INSPECT_DATA
|
|
"value1": 250, // The duration of step INSPECT_DATA
|
|
"key2": PROCESS_DATA, // The key name of PROCESS_DATA
|
|
"value2": 520, // The duration of step PROCESS_DATA
|
|
"meta": {
|
|
"dataSource": 'load-data-from-api',
|
|
},
|
|
"context": { // Context holds information identifying the deployment, version, application and page that generated the event
|
|
"version": "8.5.0-SNAPSHOT",
|
|
"cluster_name": "job-ftr_configs_2-cluster-ftr",
|
|
"pageName": "application:dashboards:app",
|
|
"applicationId": "dashboards",
|
|
"page": "app",
|
|
"entityId": "61c58ad0-3dd3-11e8-b2b9-5d5dc1715159",
|
|
"branch": "main",
|
|
"labels": {
|
|
"journeyName": "flight_dashboard",
|
|
}
|
|
...
|
|
},
|
|
...
|
|
},
|
|
}
|
|
```
|
|
|
|
The performance metrics API supports **5 numbered free fields** that can be used to report numeric metrics that you intend to analyze.
|
|
Note that they can be used for any type of numeric information you may want to report and use to create your own flexible schema,
|
|
without having to add custom mappings.
|
|
|
|
If you want to provide event specific context, you can add properties to the `meta` field.
|
|
The `meta` object is stored as a [flattened field](https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html) hence
|
|
it's searchable and can be used to further breakdown event metrics.
|
|
|
|
**Note**: It's important to keep in mind `free field` values are integers and floating point values will be rounded.
|
|
|
|
### How to choose and measure events
|
|
|
|
Events should be meaningful and can have multiple sub metrics which will give specific information of certain actions. For example
|
|
page-load events can be composed of render time, data load time during the page-load and so on. It's important to understand these
|
|
events will have meaning for performance investigations and that can be used in visualizations, aggregations. Considering this,
|
|
creating an event for cpuUsage does not bring any value because it doesn't bring any context with itself and reporting multiple of these
|
|
events in different places of code will have so much variability during performance analysis of your code. However it can be nice attribute
|
|
to follow if it's important for you to look inside of a specific event e.g. `page-load`.
|
|
|
|
- Understand your events
|
|
**Make sure that the event is clearly defined and consistent** (i.e. same code flow is executed each time).
|
|
Consider the start point and endpoint of the measurement and what happens between those points.
|
|
For example: a `app-data-load` event should not include the time it takes to render the data.
|
|
- **Choose event names wisely**.
|
|
Try to balance event names specificity. Calling an event `load` is too generic, calling an event `tsvb-data-load` is too specific (instead the visualization
|
|
type can be specified in a `meta` field)
|
|
- **Distinguish between flows with event context**.
|
|
If a function that loads data is called when an app loads, when the user changes filters and when the refresh button is clicked, you should distinguish between
|
|
these flows by specifying a `meta` field.
|
|
- **Avoid duplicate events**.
|
|
Make sure that measurement and reporting happens in a point of the code that is executed only once.
|
|
For example, make sure that refresh events are reported only once per button click.
|
|
- **Measure as close to the event as possible**.
|
|
For example, if you're measuring the execution of a specific React Effect execution, place the measurement code inside the effect.
|
|
try to place the measurement start right before the navigation is performed and stop measuring as soon as all resources are loaded
|
|
- **Use the `window.performance` API**.
|
|
The [`performance.now()`](https://developer.mozilla.org/en-US/docs/Web/API/Performance/now) API can be used to accurate way to receive timestamps
|
|
The [`performance.mark()`](https://developer.mozilla.org/en-US/docs/Web/API/Performance/mark) API can be used to track performance without having to pollute the
|
|
code.
|
|
- **Keep performance in mind**. Reporting the performance of Kibana should never harm its own performance.
|
|
Avoid sending events too frequently (`onMouseMove`) or adding serialized JSON objects (whole `SavedObjects`) into the meta object.
|
|
|
|
### Benchmarking performance on CI
|
|
|
|
One of the use cases for event based telemetry is benchmarking the performance of features over time.
|
|
In order to keep track of their stability, the #kibana-performance team has developed a special set of
|
|
functional tests called `Journeys`. These journeys execute a UI workflow and allow the telemetry to be
|
|
reported to a cluster where it can then be analysed.
|
|
|
|
Those journeys run on the key branches (main, release versions) on dedicated machines to produce results
|
|
as stable and reproducible as possible.
|
|
|
|
#### Machine specifications
|
|
|
|
All benchmarks are run on bare-metal machines with the [following specifications](https://www.hetzner.com/dedicated-rootserver/ex100):
|
|
|
|
CPU: Intel® Core™ i9-12900K
|
|
RAM: 128 GB
|
|
SSD: 1.92 TB Datacenter Gen4 NVMe
|
|
|
|
Since the tests are run on a local machine, there is also realistic throttling applied to the network to
|
|
simulate real life internet connection. This means that all requests have a [fixed latency and limited bandwidth](https://github.com/elastic/kibana/blob/main/x-pack/test/performance/services/performance.ts#L157).
|
|
|
|
#### Journey implementation
|
|
|
|
If you would like to keep track of the stability of your events, implement a journey by adding a functional
|
|
test to the `x-pack/test/performance/journeys` folder.
|
|
|
|
The telemetry reported during the execution of those journeys will be reported to the `telemetry-v2-staging` cluster
|
|
alongside with execution context. Use the `context.labels.ciBuildName` label to filter down events to only those originating
|
|
from performance runs and visualize the duration of events (or their breakdowns).
|
|
|
|
Run the test locally for troubleshooting purposes by running
|
|
|
|
```
|
|
node scripts/functional_test_runner --config x-pack/test/performance/journeys/$YOUR_JOURNEY_NAME/config.ts
|
|
```
|
|
|
|
#### Analyzing journey results
|
|
|
|
- Be sure to narrow your analysis down to performance events by specifying a filter `context.labels.ciBuildName: kibana-single-user-performance`.
|
|
Otherwise you might be looking at results originating from different hardware.
|
|
- You can look at the results of a specific journey by filtering on `context.labels.journeyName`.
|
|
|
|
Please contact the #kibana-performance team if you need more help visualising and tracking the results.
|
|
|
|
### Production performance tracking
|
|
|
|
All users who are opted in to report telemetry will start reporting event based telemetry as well.
|
|
The data is available to be analysed on the production telemetry cluster.
|
|
|
|
# Analytics Client
|
|
|
|
Holds the public APIs to report events, enrich the events' context and set up the transport mechanisms. Please checkout package documentation to get more information about
|
|
[Analytics Client](https://github.com/elastic/kibana/blob/main/packages/analytics/README.md). |