Commit graph

42 commits

Author SHA1 Message Date
github-actions[bot]
ff37802a77
Unicode pipeline and plugin ids (#15971) (#16257)
* fix: restore support for unicode pipeline- and plugin-id's

JRuby's `Ruby#newSymbol(String)` throws an exception when provided a `String`
that contains characters outside of lower-ASCII because JRuby internals expect
"the incoming String to be one of our mangled ISO-8859-1 strings" as noted in
a comment on jruby/jruby#6217.

Instead, we use `Ruby#newString(String)` to create a new `RubyString` (which
works properly), and then rely on `RubyString#intern` to get our `RubySymbol`.

This fixes a regression introduced in the 8.7 series in which pipeline id's
are consistently represented as ruby symbols in the metrics store, and ensures
similar issue does not exist when specifying a plugin id that contains
characters above the lower-ASCII plane.

* fix: use properly-encoded RubySymbol in PipelineConfig

We cannot rely on `RubySymbol#toString` to produce a properly-encoded `String`
whe the string contains characters above the lower-ASCII plane because the
result is effectively a binary ruby-internal marshal of the bytes that only
holds when the symbol contains lower-ASCII.

Instead, we can use the internally-memoizing `RubySymbol#name` to get a
properly-encoded `RubyString`, and `RubyString#asJavaString()` to get a
properly-encoded java-`String`.

* fix: properly serialize unicode pipeline names in API output

Jackson's JSON serializer leaks the JRuby-internal byte structure of Symbols,
which only aligns with the byte-structure of the symbol's actual string when
that string is wholly-comprised of lower-ASCII characters.

By pre-converting Symbols to Strings, we ensure that the result is readable
and useful.

* spec: bypass monitoring specs for unicode pipeline ids when PQ enabled

(cherry picked from commit 0ec16ca398)

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
2024-06-25 09:46:55 -07:00
Ry Biesemeyer
38e8c5d3f9
flow_metrics: pull worker_utilization up to pipeline-level (#15912) 2024-02-06 11:50:34 -08:00
Andres Rodriguez
cf67cb1377
Rubocop: Enable most SpaceInside cops (#15201)
Enabled:
* SpaceInsideArrayLiteralBrackets
* SpaceInsideParens
* SpaceInsidePercentLiteralDelimiters
* SpaceInsideStringInterpolation
* Add enforced style for SpaceInsideStringInterpolation

Enabled without offenses:
* SpaceInsideArrayPercentLiteral
* Layout/SpaceInsideRangeLiteral
* Layout/SpaceInsideReferenceBrackets
2023-07-20 09:49:46 -04:00
Andres Rodriguez
2165d43e1a
Rubocop: Enable SpaceBefore cops (#15197)
Enables the following cops:

 * Layout/SpaceBeforeBlockBraces
 * Layout/SpaceBeforeBrackets
 * Layout/SpaceBeforeComma
 * Layout/SpaceBeforeComment
 * Layout/SpaceBeforeFirstArg
 * Layout/SpaceBeforeSemicolon
2023-07-18 22:32:17 -04:00
Andres Rodriguez
4255a8fd1c
Rubocop: Enable SpaceAround cops (#15196)
* Enable SpaceARoundBlockParameters
* Enable SpaceAroundEqualsInParameterDefault
* Enable SpaceAroundKeyword
* Enable SpaceAroundOperators
* Enable SpaceBeforeBlockBraces, which yields no changes
2023-07-18 21:11:57 -04:00
Andres Rodriguez
acd87a69e7
Rubocop: Enable various EmptyLine cops (#15194)
Disabled:
 * EmptyLineAfterGuardClause
 * EmptyLineAfterMultilineCondition
 * EmptyLinesAroundAccessModifier

Enabled:
 * Layout/EmptyLineAfterMagicComment
 * Layout/EmptyLineBetweenDefs
 * Layout/EmptyLines
 * Layout/EmptyLinesAroundArguments
 * Layout/EmptyLinesAroundAttributeAccessor
 * Layout/EmptyLinesAroundBeginBody
 * Layout/EmptyLinesAroundBlockBody
 * Layout/EmptyLinesAroundExceptionHandlingKeywords
 * Layout/EmptyLinesAroundMethodBody
 * Layout/EmptyLinesAroundModuleBody
2023-07-18 16:49:16 -04:00
Andres Rodriguez
5e34aacc6e
Enable trailing whitespace formating (#15174)
* Enable Layout/TrailingWhitespace cop formation
* Remove Trailing Whitespaces
2023-07-14 13:22:02 -04:00
Mashhur
cfafce23c6
Plugin throughput, worker utilization and worker cost per event flow metrics. (#14743)
* Initial effort to initialize plugin flow metrics. Followings are addressed:
- Namespace store is shaped with RubySymbol key but filter and output codecs were using string key. This commit intends to standardize the namespace key with RubySymbol for filter & output codecs.
- Initializes throughput flow metrics for the input plugins.
- Initializes the worker cost per event and worker utilization for the filter and output plugins with only uptime metrics but it should combine with worker count, will be implemented in next commits.
- Fetching codec ID generated in ruby scope is possible but problematic to in Java scope. We will skip codec flow metrics since they are rarely produce the hard times.

* Worker utilization metrics implementation.

- Worker count will be provided as a fraction to the flow metrics. At the time when we fetch the metric value, fraction is applied.

* Unit tests added for fractured extended & simple metrics.

* Code review change requests applied.

- To simplify the scale (or fraction) at metric get value time, we can introduce the wrapper (`UpScaleMetric`) that applies the scale at metric value fetch time.
- Unit test added for `UpScaleMetric`
- We don't touch the codec namespace shape for now since we skipped codec metrics.
- Unused sources removed.

* Worker utilization and worker cost per event explanation added in the documentation.

* Integration test added for plugin-level flow metrics.

* Apply suggestions from code review

- Integration test failure fix: input plugin ID is not always in context config.
- Suggestions to simplify integration test source and rollback to intentional namings.
- Metrics explanation improvement in the doc.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* plugin flow: fix units; pass UptimeMetric and scale when needed

Aligns the units of the newly-introduced plugin metrics with the specification,
and passes our `UptimeMetric` through to the individual helper methods so that
they can scale appropriately for their context and our type-checker can ensure
we don't receive an incorrectly-scaled `Metric<Long>`.

Input `throughput`
------------------

all throughput metrics should be expressed in events-per-second; this
per-plugin scoped view of the pipeline's `input_throughput` flow should be
expressed in the same units.

Filters, Outputs `worker_utilization`
-------------------------------------

> a worker_utilization (duration / (uptime * worker count)) shows what percent
> of available resources an individual plugin instance is taking and can help
> identify where the blocker is.

To achieve this, we need to divide millis used by _millis_ available.

Filters, Outputs `worker_cost_per_event`
----------------------------------------

> we also provide a (to be named) cost-per-event metric (duration / event) to
> surface issues with a plugin that operates on a very small subset of events
> (via conditionals) but contributes disproportionately to the cost of getting
> its events through.

We start with a baseline of seconds-per-event, and acknowledge that this may
need to be scaled to a more understandable number before merging.

* plugin flow: express cost per event in millis per event

The "worker cost per event" metric when expressed as an inverse per-worker
throughput in seconds-per-event produces a range of values that are not
particularly easy to compare at-a-glance, with "nearly free" operations
being expressed in negative-exponent scientific notation and extremely
expensive operations being expressed with single-digits.

By scaling this metric up by a factor of 1000 to "millis per event" or its
eqivalent "seconds per thousand events", the resulting numbers in practice
are easier to make sense of:

+------------------------+--------------+---------------+------------+
| EXAMPLE        / SCALE |    s/event   |    ms/event   |  µs/event  |
+------------------------+--------------+---------------+------------+
| no-op mutate @ 12k eps | 8.33e-05     |        0.0833 |       83.3 |
| stdout w/ dots codec   |     0.000831 |        0.831  |      831   |
| ES out 1s RTT/125      |     0.008    |        8      |     8000   |
| ES out 30s retries/125 |     0.24     |      240      |   240000   |
| ES filter 1s/event     |     1        |     1000      |  1000000   |
| grok 30s timeout       |    30        |    30000      | 30000000   |
+------------------------+--------------+---------------+------------+

* plugin flow: reshape docs

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
2022-12-16 14:11:21 -08:00
Ry Biesemeyer
9460d4d7fc
specs: assert presence of logging without risking NoMethodError (#14633)
* specs: assert presence of logging without risking NoMethodError

* Update qa/integration/specs/monitoring_api_spec.rb
2022-11-02 11:01:59 -07:00
Mashhur
f19e9cb647
Collect queue growth events and bytes metrics when PQ is enabled. (#14554)
* Collect growth events and bytes metrics if PQ is enabled: Java changes.

* Move queue flow under queue namespace.

* Pipeline level PQ flow metrics: add unit & integration tests.

* Include queue info in node stats sample.

* Apply suggestions from code review

Change uptime precision for PQ growth metrics to uptime seconds since PQ events are based on seconds.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Add safeguard when using lazy delegating gauge type.

* flow metrics: simplify generics of lazy implementation

Enables interface `FlowMetrics::create` to take suppliers that _implement_
a `Metric<? extends Number>` instead of requiring them to be pre-cast, and
avoid unnecessary exposure of the metrics value-type into our lazy init.

* flow metrics: use lazy init for PQ gauge-based metrics

* noop: use enum equality

Avoids routing two enum values through `MetricType#toString()`
and `String#equals()` when they can be compared directly.

* Apply suggestions from code review

Optional.ofNullable used for safe return. Doc includes real tested expected metric values.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* flow metrics: make lazy-init wraper inherit from AbstractMetric

this allows the Jackson serialization annotations to work

* flow metrics: move pipeline queue-based flows into pipeline flow namespace

* Follow up for moving PQ growth metrics under pipeline.*.flow.
- Unit and integration tests are added or fixed.
- Documentation added along with sample response data

* flow: pipeline pq flow rates docs

* Do not expect flow in the queue section of API. Metrics moved to flow section.

Update logstash-core/spec/logstash/api/commands/stats_spec.rb

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Integration test failure fix.

Mistake: `flow_status` should be `pipeline_flow_stats`

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Integration test failures fix.

Number should be Numeric in the ruby specs.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Make CI happy.

* api specs: use PQ only where needed

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
2022-10-13 15:30:31 -07:00
Ry Biesemeyer
6e0b365c92
Feature: flow metrics integration (#14518)
* Flow metrics: initial implementation (#14509)

* metrics: eliminate race condition when registering metrics

Ensure our fast-lookup and store tables cannot diverge in a race condition
by wrapping mutation of both in a single mutex and appropriately handle
another thread winning the race to the lock by using the value that it
persisted instead of writing our own.

* metrics: guard against intermediate namespace conflicts

 - ensures our safeguard that prevents using an existing metric as a namespace
   is applied to _intermediate_ nodes, not just the tail-node, eliminating a
   potential crash when sending `fetch_or_store` to a metric object that is not
   expected to respond to `fetch_or_store`.
 - uses the atomic `Concurrent::Map#compute_if_absent` instead of the
   non-atomic `Concurrent::Map#fetch_or_store`, which is prone to
   last-write-wins during contention (as-written, this method is only
   executed under lock and not subject to contention)
 - uses `Enumerable#reduce` to eliminate the need for recursion

* flow: introduce auto-advancing UptimeMetric

* flow: introduce FlowMetric with minimal current/lifetime rates

* flow: initialize pipeline metrics at pipeline start

* Controller and service layer implementation for flow metrics. (#14514)

* Controller and service layer implementation for flow metrics.

* Add flow metrics to unit test and benchmark cli definitions.

* flow: fix tests for metric types to accomodate new one

* Renaming concurrency and backpressure metrics.

Rename `concurrency` to `worker_concurrency ` and `backpressure` to `queue_backpressure` to provide proper scope naming.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* metric: register flow metrics only when we have a collector (#14529)

the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.

* Unit tests and integration tests added for flow metrics. (#14527)

* Unit tests and integration tests added for flow metrics.

* Node stat spec and pipeline spec metric updates.

* Metric keys statically imported, implicit error expectation added in metric spec.

* Fix node status API spec after renaming flow metrics.

* Removing flow metric from PipelinesInfo DS (used in peridoci metric snapshot), integration QA updates.

* metric: register flow metrics only when we have a collector (#14529)

the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.

* Unit tests and integration tests added for flow metrics.

* Node stat spec and pipeline spec metric updates.

* Metric keys statically imported, implicit error expectation added in metric spec.

* Fix node status API spec after renaming flow metrics.

* Removing flow metric from PipelinesInfo DS (used in peridoci metric snapshot), integration QA updates.

* Rebasing with feature branch.

* metric: register flow metrics only when we have a collector

the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.

* Apply suggestions from code review

Integration tests updated to test capturing the flow metrics.

* Flow metrics expectation updated in tegration tests.

* flow: refine integration expectations for reloads/monitoring

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
Co-authored-by: Mashhur <mashhur.sattorov@gmail.com>

* metric: add ScaledView with sub-unit precision to UptimeMetric (#14525)

* metric: add ScaledView with sub-unit precision to UptimeMetric

By presenting a _view_ of our metric that maintains sub-unit precision,
we prevent jitter that can be caused by our periodic poller not running at
exactly our configured cadence.

This is especially important as the UptimeMetric is used as the _denominator_ of
several flow metrics, and a capture at 4.999s that truncates to 4s, causes the
rate to be over-reported by ~25%.

The `UptimeMetric.ScaledView` implements `Metric<Number>`, so its full
lossless `BigDecimal` value is accessible to our `FlowMetric` at query time.

* metrics: reduce window for too-frequent-captures bug and document it

* fixup: provide mocked clock to flow metric

* Flow metrics cleanup (#14535)

* flow metrics: code-style and readability pass

* remove unused imports

* cleanup: simplify usage of internal helpers

* flow: migrate internals to use OptionalDouble

* Flow metrics global (#14539)

* flow: add global top-level flows

* docs: add flow metrics

* Top level flow metrics unit tests added. (#14540)

* Top level flow metrics unit tests added.

* Add unit tests when config reloads, make sure top-level flow metrics didn't get reset.

* Apply suggestions from code review

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Validating against Hash test cases updated.

* For the safety check against exact type in unit tests.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* docs: section links and clarity in node stats API flow metrics

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
Co-authored-by: Mashhur <mashhur.sattorov@gmail.com>
2022-09-19 14:21:45 -07:00
kaisecheng
7641b076f4
fix monitoring api integration test with draining queue (#14106)
This commit ends the integration test with teardown instead of sending a signal to kill
Related: #13935
2022-05-16 23:26:24 +01:00
kaisecheng
0af9fb0d5f
Allow metrics update when PQ draining (#13935)
This commit moves the stop of metrics collection after pipelines shutdown to allow metrics update during PQ draining
Fixed: #13832
2022-05-16 15:36:56 +01:00
Andrea Selva
b6da829f4f
Avoid to increment event.out conter for dropped events (#13593)
Fixes the issue #8752 in event.out counter. When a pipeline contains a drop filter the total out events counter should count only the events that reached the out stage.

This PR changes CompiledExecution.compute() interface to return the number of events that effectively reached the end of the pipeline. This change is used in WorkerLoop to update correctly the event.out metric, instead of relying on the batch's size.
2022-01-14 15:52:06 +01:00
Karol Bucek
51fad3f56b
Test: improve monitoring api logging asserts (#13164) 2021-08-25 17:16:52 +02:00
João Duarte
e9c9865f40
Add apache and elastic license headers to source code files (#11673)
* add license header to ruby and java files
* add license header to erb and rake files
* add license headers to gradle files
2020-03-11 11:53:38 +00:00
andsel
f554930e81 Introduced DeprecationLogger for use in core code and exposed to Java and Ruby plugins. Closes 11049
Fixes #11260
2019-11-14 10:49:27 +00:00
Joao Duarte
1e461dee22 more resilient testing of logging level setting
Fixes #11230
2019-10-17 13:04:14 +00:00
Dan Hermann
73346d20f5 Update Reflections library
Fixes #10951
2019-07-16 11:59:29 +00:00
Dan Hermann
d2c7fc7eb5 Support setting logging back to defaults via the API.
Fixes #8384

Fixes #8786
2017-12-01 23:09:05 +00:00
Rob Bavey
8da0737252 DeadLetterQueueReader should handle missing segment files at startup
Change DeadLetterQueueReader, so that if a missing segment file is
encountered at startup, the next valid entry will be used instead

Fixes #7433

Fixes #7457
2017-08-14 20:07:26 +00:00
Jake Landis
3a6902c2f3 Bug Fix: Log messages from Java code base
This change re-configures default instead of creating a new context.

Fixes #7526

Fixes #7528
2017-07-06 16:53:08 +00:00
Rob Bavey
2d2cee4d2e Add Metrics for dead letter queue
Add an initial set of metrics for the dead letter queue.

Metrics are supplied under the pipeline in the following format:

"dead_letter_queue": {
        "queue_size_in_bytes": ...,
      }

Metrics are populated via a PeriodicPoller

Also fixed up calculation of currentQueueSize to take account
of version headers, which was previously being skipped.

Additionally, whether the dlq is enabled, and if so, the path
of the dlq is supplied under the pipelines API endpoint

Resolves #7287

Fixes #7338
2017-06-09 17:14:18 +00:00
Jake Landis
5b4a725a36 Fix #7277 Review updates
Fixes #7321
2017-06-09 14:56:40 +00:00
Jake Landis
55979c3a07 Bug Fix: Fix ability to change log level via the API
The ability to change the log level as described at https://www.elastic.co/guide/en/logstash/current/logging.html#_logging_apis appears to no longer work. The root cause of this issue is that log4j2 context we initialize is different from the log4j2 context that we are setting via the API. The fix here is to mirror the functionality of org.apache.logging.log4j.core.config.Configurator.setLevel Java method, but instead use the logging_context initialized in our code via JRuby.

Fixes #7277

Fixes #7321
2017-06-09 14:56:40 +00:00
Joao Duarte
bed8b8a084 support multiple pipelines in one logstash instance
* add multi_local source for multi pipelines
* introduce pipelines.yml
* introduce PipelineSettings class
* support reloading of pipeline parameters
* fix pipeline api call for _node/pipelines
* inform user pipelines.yml is ignored if -e or -f is enabled
2017-05-30 09:47:53 +01:00
Joao Duarte
412dd7bc08 add ExpectationNotMetError to Stud.try on monitoring_api_spec
in some tests against metrics api it's possible the values aren't there,
making the assertion fail. By default Stud.try only catches StandardError,
but rspec throws a RSpec::Expectations::ExpectationNotMetError on a failed
assertion, and this exception inherits directly from Exception.

This commit explicitly adds this exception to the list of Stud.try exceptions

Fixes #7177
2017-05-22 13:46:30 +00:00
Joao Duarte
c3df20b7cf protect monitoring_api_spec against timing issues
Fixes #7145
2017-05-21 18:52:14 +00:00
Armin
2ac72226d3 #7091 Dockerized Kafka IT fixture
Fixes #7093
2017-05-13 22:46:24 +00:00
Joao Duarte
b7f4f9d649 fix remaining race condition in monitoring api integration spec
* add NoMethodError to list of allowed exceptions in try
2017-05-11 18:25:13 +01:00
Joao Duarte
838f593d1a fix monitoring api integration race condition
because the metric subsystem may not be ready yet when the tests ask for
the values, the test code may generate a NoMethodError accessing nested
hashes.

The NoMethodError exception aborts the try mechanism instead of retrying.

This PR rescues potential exceptions and adds more expectations to
protect the try block.

Fixes #7074
2017-05-11 13:06:58 +00:00
Joao Duarte
2b7b8f0d81 make integration specs a bit more resilient
* wait for logstash starting in monitoring_api integration tests
* remove initialization code from monitoring api integration spec
2017-05-11 10:24:57 +01:00
Wainer dos Santos Moschetta
6b282234cb ci: fix run single specs
Some integration spec tests fail to run
individually due missing require modules.

This fix it by ensuring all specs and helpers
require the needed modules.

Fixes #7037

Fixes #7038
2017-05-06 01:30:33 +00:00
Tal Levy
641b855127 remove current_size_in_bytes and acked info from node stats
re #6508.

- removed `acked_count`, `unacked_count`, and migrated `unread_count` to
top-level `events` field.
- removed `current_size_in_bytes` info from queue node stats

Fixes #6510
2017-01-09 19:19:28 -05:00
Tal Levy
2b45a9b4ae add queue stats to node/stats api (#6331)
* add queue stats to node/stats api

example queue section:

```
"queue" : {
    "type" : "persisted",
    "capacity" : {
      "page_capacity_in_bytes" : 262144000,
      "max_queue_size_in_bytes" : 1073741824,
      "max_unread_events" : 0
    },
    "data" : {
      "free_space_in_bytes" : 33851523072,
      "current_size_in_bytes" : 262144000,
      "storage_type" : "hfs",
      "path" : "/logstash/data/queue"
    },
    "events" : {
      "acked_count" : 0,
      "unread_count" : 0,
      "unacked_count" : 0
    }
}
```

Closes #6182.

* migrate to use period metric pollers for storing queue stats per pipeline
2016-12-15 13:14:47 -08:00
Suyog Rao
183ab07ec0 shutdown after spec
Fixes #6236
2016-11-11 11:52:18 -05:00
Suyog Rao
e78fce609b Add JVM uptime stats
Fixes #6236
2016-11-11 11:52:18 -05:00
Suyog Rao
75d70c6a20 Add new tests and some helpers
Fixes #6031
2016-10-14 18:27:44 -04:00
Suyog Rao
dfd6164edd changes after review
Fixes #5917
2016-09-22 13:43:48 -04:00
Suyog Rao
4008a7ad7e Fixes after @ph review
Fixes #5917
2016-09-22 13:43:48 -04:00
Suyog Rao
d71112da0e Cleanup of print messages
Fixes #5917
2016-09-22 13:43:47 -04:00
Suyog Rao
bb4189cdc8 Add initial round of integration tests
Fixes #5917
2016-09-22 13:43:47 -04:00