Commit graph

1 commit

Author SHA1 Message Date
Dzmitry Lemechko
5f31ebf1ce
Benchmark single apis (#146297)
## Summary

This PR adds capability to run capacity testing for single apis #143066

Currently in main we have to 2 types of performance tests:
- single user performance journey that simulates single end-user
experience in browser
- scalability journey that uses APM traces from single user performance
journey to simulate multiple end-users experience

This new type of performance tests allow to better understand how each
single server api scale under the similar load.

How to run locally:
make sure to clone the latest main branch of
[elastic/kibana-load-testing](https://github.com/elastic/kibana-load-testing)
in Kibana repo run:
`node scripts/run_scalability.js --journey-path
x-pack/test/scalability/apis/api.core.capabilities.json`

How it works:
FTR is used to start Kibana/ES and run Gatling simulation with json file
as input. After run the latest report matching journey name is parsed to
get perf metrics and report using EBT to the Telemetry cluster.

How will it run after merge:
I plan to run pipeline every 3 hours on bare metal machine and report
metrics to Telemetry staging cluster.
<img width="2023" alt="image"
src="https://user-images.githubusercontent.com/10977896/208771628-f4f5dbcb-cb73-40c6-9aa1-4ec3fbf5285b.png">


APM traces are collected and reported to Kibana stats cluster:
<img width="1520" alt="image"
src="https://user-images.githubusercontent.com/10977896/208771323-4cca531a-eeea-4941-8b01-50b890f932b1.png">


What metrics are collected:

1. warmupAvgResponseTime - average response time during warmup phase
2. rpsAtWarmup - average requests per second during warmup phase
3. warmupDuration
4. responseTimeMetric (default: 85%) Gatling has response time
25/50/75/80/85/90/95/99 percentiles, as well as min/max values
5. threshold1ResponseTime (default 3000 ms)
6. rpsAtThreshold1 requests per second when `responseTimeMetric` first
reach threshold1ResponseTime
7. threshold2ResponseTime
8. rpsAtThreshold2 (default 9000 ms)
9.  threshold3ResponseTime
10. rpsAtThreshold3 (default 15000 ms)

As long as we agree on metrics I will update indexer for telemetry.

Co-authored-by: Alejandro Fernández Haro <alejandro.haro@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
2023-01-09 16:38:30 +01:00