Commit graph

71 commits

Author SHA1 Message Date
Lorenzo Dematté
ef8a8bf654
Update APM Java Agent to support JDK 23 (#115194) (#115237) 2024-10-22 02:18:55 +11:00
Mark Vieira
0279c0a909
Add AGPLv3 as a supported license 2024-09-13 14:30:33 -07:00
Jan Kuipers
26623f14d8
Inference autoscaling telemetry (#110630)
* Wire MeterRegistry

* Allow for collections of values in async APM measurements

* Adaptive allocations scaler metrics

* Update docs/changelog/110630.yaml

* Update 110630.yaml
2024-07-29 10:38:18 +02:00
Moritz Mack
7ee5c73e7d
Deprecate Telemetry / APM legacy settings in favor of the new telemetry.* settings (#104908) 2024-04-11 08:44:20 +02:00
Dmitry Cherniachenko
e21a4874ab
Use String.replace() instead of replaceAll() for non-regexp replacements (#105127)
* Use String.replace() instead of replaceAll() for non-regexp replacements

When arguments do not make use of regexp features replace() is a more efficient option, especially the char-variant.
2024-02-12 13:11:15 -05:00
Moritz Mack
54088839b4
Do not enable APM agent 'instrument', it's not required for manual tracing. (#105055) 2024-02-02 18:13:00 +01:00
Moritz Mack
dbf59c5414
Update/Cleanup references to old tracing.apm.* legacy settings in favor of the telemetry.* settings (#104917) 2024-01-31 09:20:05 +01:00
Moritz Mack
9ea187dd76
Fix enabling / disabling of APM agent "recording" in APMAgentSettings (#104324) 2024-01-30 17:29:21 +01:00
Moritz Mack
a3b1d86c45
Reuse APMMeterService of APMTelemetryProvider (#104906) 2024-01-30 15:49:56 +01:00
Moritz Mack
35cc9e1159
New APM settings using telemetry. prefix deprecate ambiguous tracing.apm. settings. (#104376)
Telemetry / APM settings are renamed from "tracing.apm.{name}" to "telemetry.tracing.{name}" for tracing related settings. General APM settings are renamed to "telemetry.{name}". The old legacy settings are kept for now and applied as fallback.
2024-01-30 09:34:03 +01:00
Stuart Tettemer
3493e425ac
Metrics: Agent settings prefix telemetry.agent preferred over tracing.apm.agent (#104345)
Prefer the telemetry.agent prefix for APM agent settings.

Add a fallback prefix to Affix settings to migrating between an old prefix
and a new prefix.
2024-01-22 12:53:36 -06:00
Gareth Ellis
764269b395
Thread pool metrics (#104500)
This implements metrics for the threadpools.

The aim is to emit metrics for the various threadpools, the metric callback should be created when the threadpool is created, and removed before the threadpool is shutdown.

The PR also includes a test for the new metrics, and some additions to the metrics test plugin.

Finally the metric name check has been modified to allow some of the non compliant threadpools (too long, includes - )
Co-authored-by: Przemyslaw Gomulka <przemyslaw.gomulka@elastic.co>
2024-01-18 10:15:58 +01:00
Przemyslaw Gomulka
aa42368dba
Revert "Adding threadpool metrics (#102371)" (#104467)
This reverts commit afd915af1e.
2024-01-17 17:01:28 +01:00
Gareth Ellis
afd915af1e
Adding threadpool metrics (#102371)
This implements metrics for the threadpools.

The aim is to emit metrics for the various threadpools, the metric callback should be created when the threadpool is created, and removed before the threadpool is shutdown.

The PR also includes a test for the new metrics, and some additions to the metrics test plugin.

Finally the metric name check has been modified to allow some of the non compliant threadpools (too long, includes - )

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Przemyslaw Gomulka <przemyslaw.gomulka@elastic.co>
2024-01-17 15:27:12 +01:00
Moritz Mack
c11f329720
Temporarily tolerate tracing.apm.agent.global_labels.XYZ settings (#104317) 2024-01-12 16:16:27 +01:00
Moritz Mack
063fc26a20
Temporarily tolerate tracing.apm.agent.global_labels.XYZ settings (#104315) 2024-01-12 14:56:32 +01:00
Moritz Mack
00ca64bcf2
Use allow-list for APM agent settings and consolidate defaults in APMJvmOptions (#104141)
Prevent invalid and miss-configuration of APM agent using an explicit allow-list of setting keys.
Additionally, configuration defaults of APMAgentSettings are consolidated in APMJvmOptions to keep defaults in a single location.
(ES-6916)
2024-01-12 10:30:04 +01:00
Przemyslaw Gomulka
07780a8282
Add metric name validation (#103388)
This commit adds a minimum metric name validation which checks:

metric name starts with es. prefix
metric name is using . as a separator of elements
metric name is using characters from a white list
validate min number of elements = 3 elements ( prefix, group and the suffix name)
validate max number of elements and max characters per element
validate the suffix element in a metric name to be from the enumerated allow list

It also modifies existing metric names to adhere to those rules
2024-01-09 11:05:00 +01:00
Stuart Tettemer
0ce3e8cb71
APM: Add TraceContext interface to reduce server dependencies in telemetry (#103635)
`Tracer.startTrace(ThreadContext threadContext, Traceable traceable, String name, Map<String, Object> attributes)` takes in a `ThreadContext` which creates a dependency on `server`.  This change adds a new interface, `ThreadContext` with those methods required from `ThreadContext` and uses that as the parameter to `startTrace`.

The methods in the `TraceContext` interface are
```
    <T> T getTransient(String key);
    void putTransient(String key, Object value);
    String getHeader(String key);
    void putHeader(String key, String value);
```

which are needed for getting the parent context, the remote headers, the x-opaque-id and setting the remote headers for a trace.

This is an ugly but functional way to remove a dependency on server to be able to move telemetry in to a library for static metric registration.
2023-12-27 13:09:52 -06:00
Stuart Tettemer
762e744aab
APM: Add Traceable interface to reduce server dependencies in telemetry (#103593)
`Tracer.startTrace` has overloaded methods for accepting `RestRequest` and `Task.` In order to implement static registration, we want to move APM into a library. Accepting these two types as parameters creates a dependency on server that needs to be removed for telemetry into lib.

This change removes the overloaded method in favor of the `Traceable` interface allows `Tracer` to get the `SpanId`.

The `ThreadContext` argument also creates a dependency on server, but that is more involved and will be addressed in another change.
2023-12-27 09:00:31 -06:00
Przemyslaw Gomulka
52c2011e84
Add debug logger when registering a metric (#103458)
This commit adds a log message at debug level when a metric is registered. This allows to start elasticsearch with a level changed for the apm package and get a list of metric names.
It should be useful when trying to came up with a new meteric name the command to run
./gradlew run -Dtests.es.logger.org.elasticsearch.telemetry.apm=debug
2023-12-18 10:38:13 +01:00
Przemyslaw Gomulka
12f5f96345
[fix] readme for APM metrics (#103428)
in previous #103400
I forgot to commit a code review follow up
2023-12-14 12:49:32 +01:00
Przemyslaw Gomulka
c39a59503c
Add readme for APM metrics (#103400)
this commit adds documentation about the APM metrics usage in Elasticsearch
2023-12-14 11:25:14 +01:00
Lorenzo Dematté
7748638551
Adding metric guidelines to the repository (#103329) 2023-12-12 19:21:20 +01:00
Przemyslaw Gomulka
543919b7f3
Enable APM tracing when --with-apm-server is used (#103268)
when runtask (gradlew run) task is run with --with-apm-server apm tracing should also be enabled
2023-12-11 15:12:28 +01:00
Stuart Tettemer
ab12d4e362
Metrics: Handle null observations in observers (#103091)
Avoid null pointer error when an observation is null [0].
* To get the tests working, replace the this-escape buildInstrument with a Builder.
* Removes locking around close
2023-12-08 09:24:10 -06:00
Stuart Tettemer
18691593e8
Metrics: Allow AsyncCounters to switch providers (#103025)
{Double,Long}AsyncCounters were not in the registrar list so they were not included when switching providers.

Other changes:
* Moved DoubleAsyncCounter's register and get to match the order of the rest of APMMeterRegistry.
* Fixed RecordingOtelMeter's Histogram and DoubleUpDownCounter
* Fixed String format in RecordingMeter's asserts.
2023-12-08 09:22:22 -06:00
Przemyslaw Gomulka
2502af8198
x-pack:apm-data module should mention it is for APM Server (#102866)
apm-data module should be more explicit that it is for the apm server usage. It is confusing when starting up ES to see APM is disabled log line. Especially since we also have :modules:apm which is meant to be for sending apm metrics and traces

This commit rephrases the log messages and rename APMPlugin class to mention APM Server.
2023-12-05 12:52:04 +01:00
Przemyslaw Gomulka
c11c2afc39
Update apm's otel version (#102851)
The otel api also has to be upgraded along the new apm agent version.
the versions are picked as per 546da88d55 (diff-b977da1986b483bc5635c37235e99d23e8825301044d4316d37d9315eff89fddR22)

follow up after an apm agent version upgrade https://github.com/elastic/elasticsearch/pull/102691
2023-12-01 15:43:24 +01:00
Przemyslaw Gomulka
bb6442579a
Add apm api for asynchronous counters (always increasing) (#102598)
asynchronous counters (available in otel) are instruments which allow to update the counter's value with the callback style api. The callback has to report always the latest value. Upon restart the value might go back to 0, but in the backend apm server will know that since the value is always increasing.
2023-11-28 20:08:05 +01:00
Przemyslaw Gomulka
2d75217d8f
Update APM agent version (#102691)
this commit upgrades APM agent version to 1.44 to include a fix to a max metric name being increased from 63 to 255
2023-11-28 13:18:39 +01:00
Stuart Tettemer
6e8d798e4b
Metrics test: testLockingWhenRegistering wait for threads to join (#101751)
The test was checking for the wrong meter instance, it should be checking
for the updated meter, `noopMeter`.

Removed the assertBusy(() -> assertThat...) which was hiding the problem.

The goal of the test is that we serialize access to registration and the
provider.  A nice way to do this is wait for the contending threads to
join (with the added benefit of avoiding thread leaks).  After the threads
join, we check that we have the expected state.

Fixes: #101725
2023-11-03 11:00:54 -05:00
David Turner
cf6cfc97cd AwaitsFix for #101725 2023-11-02 17:23:55 +00:00
Stuart Tettemer
d9054072c9
Metrics: Reject names longer than 63 characters (#101680)
Open Telemetry supports instrument names lengths 63 characters or
less for versions before 1.30.

Reject those names early to avoid runtime failures that hit the log.

Refs: #101679
2023-11-02 10:24:57 -05:00
Benjamin Trent
1722c0a9e9
Suppress this-escape warning for JDK21 (#101519)
* Suppress this-escape warning for JDK21

* add suppression to all ctors in esql Lexer & Parser
2023-10-31 10:01:57 -04:00
Simon Cooper
22381fd6a7
Refactor overrides of old Plugin.createComponents method to new services method (#101381) 2023-10-26 16:58:14 +01:00
Stuart Tettemer
aa7e92e29b
Metrics: Gauge Callbacks (#101286)
Replace synchronous gauges with gauges that accept an observer which is called once per reporting interval.

Calling code may register one observer at a time. If the observer is null, there is no observation in that reporting interval.
2023-10-25 14:26:47 -05:00
Stuart Tettemer
5b2c25f80b
Metrics Tests - Recording Otel Meter (#101281)
Adds an implementation of Otel's Meter that records all instrument calls made through the open telemetry interface.

This allows the registry to avoid mocking out otel in testing.

Updates the GaugeAdapterTests to use the recording meter.
2023-10-25 09:30:05 -05:00
Stuart Tettemer
d8b2c52c82
Metrics refactor - split registry and service (#101154)
This splits out the registry and the service, which makes testing easier and removes much of the delegation from the old `APMMeter` to `Instruments` (now renamed `APMMeterRegistry`).

APMMeterService takes care of the lifecycle and APMMeterRegistry holds the instruments.
2023-10-23 13:28:46 -05:00
Przemyslaw Gomulka
3465a2bf18
Fix metric gauge creation model (#100609)
OTEL gauges should follow the callback model otherwise they will not be sent by
apm java agent. (or use BatchCallback)
This commit changes the gagues creation model to return Observable*Gauge
and uses AtomicLong/Double to store current value which will be polled when
metrics are exported (and callback is called)
2023-10-10 13:28:02 -05:00
Simon Cooper
d81dbfa8da
Fix race condition in InstrumentsConcurrencyTests (#100518)
Fix a race condition between the two threads in
InstrumentsConcurrencyTests. If the second thread gets the lock first,
the test fails.

Fixes #100251
2023-10-09 11:57:13 -04:00
Stuart Tettemer
110dd5ed16
Tracing: Use doPriv when working with spans, use SpanId (#100232)
`SpanId` is used when explicitly closing the trace in `executeQueryPhase` to avoid double closing the associated task.

`doPrivileged` avoids hitting `java.lang.UnsupportedOperationException: Cannot define class using reflection: access denied ("java.lang.reflect.ReflectPermission" "suppressAccessChecks")` when classes are sometimes injected while switching spans.

Removed `default Releasable withScope(Task task)` from the Tracer API because it automatically created a span id and, in one of the three uses, that SpanId was necessary to close the span.

Fixes: #100072
2023-10-05 12:58:18 -05:00
Luca Cavanna
5bfaa7d2f0 Address bad merge
Adjust the RegExp import in APMTracer
2023-10-02 16:10:16 +02:00
Luca Cavanna
689a1e490a Merge branch 'main' into lucene_snapshot_9_8 2023-10-02 13:56:12 +02:00
Lorenzo Dematté
cc572fd92d
Moved APM service version from Version to Build.version() (#100084) 2023-10-02 12:12:35 +02:00
Przemyslaw Gomulka
b856bf264d
Update the elastic-apm-agent version (#100064)
The latest version contains a fix to allow sending metrics to APM server. also adds a apm agent jvm options
"enable_experimental_instrumentations", "true"
which is required to enable the otel-metrics-instrumentation.

relates https://github.com/elastic/elasticsearch/pull/99832
2023-09-29 14:35:04 -05:00
Stuart Tettemer
f8d09e9c6c
APM Metering API (#99832)
Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types:
* Gauge - a single number that can go up or down
* Histogram - bucketed samples
* Counter - monotonically increasing summed value
* UpDownCounter - summed value that may decrease

Supports both Long* and Double* versions of the instruments.

Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider.

The metering provider starts as the open telemetry noop provider.

`telemetry.metrics.enabled` turns on metering.
2023-09-28 19:35:46 -05:00
Luca Cavanna
15c87b681c Merge branch 'main' into lucene_snapshot_9_8 2023-09-28 12:19:14 +02:00
Przemyslaw Gomulka
eca41871aa
Use TelemetryProvider in Plugin::createComponents (#99737)
in order to avoid adding yet anther parameter to createComponents
a Tracer interface is replaced with TelemetryProvider.
this allows to get both Tracer and Metric (in the future) interfaces
2023-09-22 14:48:11 +02:00
Przemyslaw Gomulka
0efa67821d
Rename TracerPlugin to TelemetryPlugin (#99735)
with the support of metrics the TracerPlugin name is no longer adequate. Renaming this to TelemetryPlugin.
Also introducing TelemetryProvider interface. While it is only used in Node.java at the moment to fetch Tracer instance, it is intended to be used in Plugin::createComponents (to be done in separate commit due to
the broad scope of this method)
This will allow for plugins to get access to both Tracer and Metric interfaces
without the need to add yet another argument to createComponents

Also adding internal subpackage in module/apm so that it is more obvious
which packages are not exported
2023-09-22 13:35:36 +02:00