* Flow metrics: initial implementation (#14509)
* metrics: eliminate race condition when registering metrics
Ensure our fast-lookup and store tables cannot diverge in a race condition
by wrapping mutation of both in a single mutex and appropriately handle
another thread winning the race to the lock by using the value that it
persisted instead of writing our own.
* metrics: guard against intermediate namespace conflicts
- ensures our safeguard that prevents using an existing metric as a namespace
is applied to _intermediate_ nodes, not just the tail-node, eliminating a
potential crash when sending `fetch_or_store` to a metric object that is not
expected to respond to `fetch_or_store`.
- uses the atomic `Concurrent::Map#compute_if_absent` instead of the
non-atomic `Concurrent::Map#fetch_or_store`, which is prone to
last-write-wins during contention (as-written, this method is only
executed under lock and not subject to contention)
- uses `Enumerable#reduce` to eliminate the need for recursion
* flow: introduce auto-advancing UptimeMetric
* flow: introduce FlowMetric with minimal current/lifetime rates
* flow: initialize pipeline metrics at pipeline start
* Controller and service layer implementation for flow metrics. (#14514)
* Controller and service layer implementation for flow metrics.
* Add flow metrics to unit test and benchmark cli definitions.
* flow: fix tests for metric types to accomodate new one
* Renaming concurrency and backpressure metrics.
Rename `concurrency` to `worker_concurrency ` and `backpressure` to `queue_backpressure` to provide proper scope naming.
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* metric: register flow metrics only when we have a collector (#14529)
the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.
* Unit tests and integration tests added for flow metrics. (#14527)
* Unit tests and integration tests added for flow metrics.
* Node stat spec and pipeline spec metric updates.
* Metric keys statically imported, implicit error expectation added in metric spec.
* Fix node status API spec after renaming flow metrics.
* Removing flow metric from PipelinesInfo DS (used in peridoci metric snapshot), integration QA updates.
* metric: register flow metrics only when we have a collector (#14529)
the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.
* Unit tests and integration tests added for flow metrics.
* Node stat spec and pipeline spec metric updates.
* Metric keys statically imported, implicit error expectation added in metric spec.
* Fix node status API spec after renaming flow metrics.
* Removing flow metric from PipelinesInfo DS (used in peridoci metric snapshot), integration QA updates.
* Rebasing with feature branch.
* metric: register flow metrics only when we have a collector
the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.
* Apply suggestions from code review
Integration tests updated to test capturing the flow metrics.
* Flow metrics expectation updated in tegration tests.
* flow: refine integration expectations for reloads/monitoring
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
Co-authored-by: Mashhur <mashhur.sattorov@gmail.com>
* metric: add ScaledView with sub-unit precision to UptimeMetric (#14525)
* metric: add ScaledView with sub-unit precision to UptimeMetric
By presenting a _view_ of our metric that maintains sub-unit precision,
we prevent jitter that can be caused by our periodic poller not running at
exactly our configured cadence.
This is especially important as the UptimeMetric is used as the _denominator_ of
several flow metrics, and a capture at 4.999s that truncates to 4s, causes the
rate to be over-reported by ~25%.
The `UptimeMetric.ScaledView` implements `Metric<Number>`, so its full
lossless `BigDecimal` value is accessible to our `FlowMetric` at query time.
* metrics: reduce window for too-frequent-captures bug and document it
* fixup: provide mocked clock to flow metric
* Flow metrics cleanup (#14535)
* flow metrics: code-style and readability pass
* remove unused imports
* cleanup: simplify usage of internal helpers
* flow: migrate internals to use OptionalDouble
* Flow metrics global (#14539)
* flow: add global top-level flows
* docs: add flow metrics
* Top level flow metrics unit tests added. (#14540)
* Top level flow metrics unit tests added.
* Add unit tests when config reloads, make sure top-level flow metrics didn't get reset.
* Apply suggestions from code review
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* Validating against Hash test cases updated.
* For the safety check against exact type in unit tests.
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* docs: section links and clarity in node stats API flow metrics
Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
Co-authored-by: Mashhur <mashhur.sattorov@gmail.com>
This commit:
- Updates the Gradle wrapper to version 7.2
- Remove the deprecated jcenter and where it was used to retrieve Gradle's plugins it switches to gradlePluginPortal
- Insert an explicit dependency from test task to the log4j.properties manipulation task ("copyProductionLog4jConfiguration") used in integration
Starting with version 7.10.0 the name of LS packages changed, adding os and CPU architecture in the name. This change broke the downloading of those from the benchmarking tool. This commit fixes it, composing correctly the name, based on the version it has to download.
Since the introduction of this block:
```
"pipeline" : {
"workers" : 16,
"batch_size" : 125,
"batch_delay" : 50
},
```
to the node stats API, the benchmarking tool has been broken. This commit fixes the
tool, and updates the payload in the tests to reflect the current payload.
Conversion to seconds of values under 1_000_000_000 nanoseconds translates to value 0, and this led to NaN when used as denominator in a division.
A value of 996_920_400 nanoseconds once converted to seconds is not rounded to 1 second by to 0, this manifest on Windows OS
* Update gradle version to 6.3
Gradle versions prior to 6.3 cannot run under JDK14.
This commit upgrades the version of Gradle to 6.3, and removes all deprecation warnings that can currently be removed.
Changes include:
* Increase gradle memory to 2g
* Increase gradle memory in the license check job to 2g
* Replace use of `testCompile`
* Replace `runtime` with `runtimeOnly`
* Remove`compile` depedencies from gradle files
* Replace deprecated archive methods
* Fix dependencies report build
* Make jruby dependencies 'api', fix archiveVersion
* Set `duplicatesStrategy` for all tasks of type Copy
* Use `configureEach` for global 'withType' calls
** Use the recommended Tasks API calls
(https://blog.gradle.org/preview-avoiding-task-configuration-time)
* Run `./gradlew wrapper` earlier to improve caching
* Use copy with chown for resources that need to be run during `./gradlew wrapper`