* geoip: extract database manager to stand-alone feature
Introduces an Elastic-licensed GeoipDatabaseManagement tool that can be used
by ANY plugin running on Elastic-licensed Logstash to retrieve a subscription
to a GeoIP database that ensures EULA-compliance and frequent updates, and
migrates the previous Elastic-licensed code-in-Logstash-core extension to
the Geoip Filter to use this new tool, requiring ZERO changes to in-the-wild
versions of the plugin.
The implementation of the new tool follows the previous implementation as
closely as possible, but presents a new interface that ensures that a
consumer can ATOMICALLY subscribe to a database path without risk that the
subscriber will receive an update or expiry before it is finished applying
the initial value:
~~~ ruby
geoip_manager = LogStash::GeoipDatabaseManagement::Manager.instance
subscription = geoip_manager.subscribe('City')
subscription.observe(construct: ->(initial_dbinfo){ },
on_update: ->(updated_dbinfo){ },
on_expire: ->( _ ){ })
subscription.release!
~~~
* docs: link in geoip database manager docs
* docs: reorganize pending 'geoip database management' feature
* docs: link to geoip pages from feature index
* geoip: add SubscriptionObserver "interface"
simplifies using Subscription#observe from Java
* geoip: fixup SubscriptionObserver after rename
* geoip: quacking like a SubscriptionObserver is enough
* geoip: simplify constants of legacy geoip filter extension
* geoip: bump logging level to debug for non-actionable log
* geoip: refine log message to omit non-actionable info
* re-enable invokedynamic (was disabled to avoid upstream bug)
* geoip: resolve testing fall-out from filter extension's "private" constants removal
* geoip: consistently use `DataPath#resolve` internally, too
* Add known imap and email plugin issues section to Logstash 8.10+ versions.
Co-authored-by: Mashhur <mashhur.sattorov@elastic.co>
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
(cherry picked from commit a88f82e77f)
Expands the description of memory used by Logstash, dividing the heap and non-heap; describing in details which parts composes the non-heap, how to size it and list the JVM settings that can be used to properly define this space.
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Since the DRA build for 8.10.1 was made with 82daae80bb , this fix didn't get in.
(cherry picked from commit aa9265665e)
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
This commit adds missing Elasticsearch SSL settings and replaces deprecated options being used on `xpack.monitoring.*` and `xpack.management.*` settings:
Changes:
- Updated deprecated monitoring and management Elasticsearch's SSL settings so no warnings are logged.
- Added monitoring settings support for file-based certificates and for the cipher suites: `xpack.monitoring.elasticsearch.ssl.certificate`, `xpack.monitoring.elasticsearch.ssl.key`, and `xpack.monitoring.elasticsearch.ssl.cipher_suites`.
- Added management settings support for file-based certificates and for the cipher suites: `xpack.management.elasticsearch.ssl.certificate`, `xpack.management.elasticsearch.ssl.key`, and `xpack.management.elasticsearch.ssl.cipher_suites`.
* docs: fix example block syntax types and truncations
* docs: provide wrapping hints to flow metric tables
* docs: refresh node stats api response examples
include only `current` and `lifetime` metrics that are GA, and not
technology preview metrics.
* docs: use "m(onospace)" modifier for metric name columns
* docs: swap literal column to first
relies on `#guide table td:first-child .literal` having `white-space: nowrap`
Reject illegal value assigning to `tags` field. Top-level `tags` should only accept string of array of string.
When `tags` got illegal value on event creation, LogStash::Event will rename the field to `_tags` and add a tag `_tagsparsefailure` to `tags`.
When `tags` got illegal value on `set` operation, LogStash::Event will throw exception.
Add a flag `--event_api.tags.illegal` to allow fallback to old logic. There are two options.
`warn` - the old flow that allows illegal value assignment to tags field.
`rename` - the new flow. This is the default value in 8.7
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
* Forward port of 8.6 and 8.5.3 release notes to main
* Better phrasing for pipeline level metrics feature
Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
* Initial effort to initialize plugin flow metrics. Followings are addressed:
- Namespace store is shaped with RubySymbol key but filter and output codecs were using string key. This commit intends to standardize the namespace key with RubySymbol for filter & output codecs.
- Initializes throughput flow metrics for the input plugins.
- Initializes the worker cost per event and worker utilization for the filter and output plugins with only uptime metrics but it should combine with worker count, will be implemented in next commits.
- Fetching codec ID generated in ruby scope is possible but problematic to in Java scope. We will skip codec flow metrics since they are rarely produce the hard times.
* Worker utilization metrics implementation.
- Worker count will be provided as a fraction to the flow metrics. At the time when we fetch the metric value, fraction is applied.
* Unit tests added for fractured extended & simple metrics.
* Code review change requests applied.
- To simplify the scale (or fraction) at metric get value time, we can introduce the wrapper (`UpScaleMetric`) that applies the scale at metric value fetch time.
- Unit test added for `UpScaleMetric`
- We don't touch the codec namespace shape for now since we skipped codec metrics.
- Unused sources removed.
* Worker utilization and worker cost per event explanation added in the documentation.
* Integration test added for plugin-level flow metrics.
* Apply suggestions from code review
- Integration test failure fix: input plugin ID is not always in context config.
- Suggestions to simplify integration test source and rollback to intentional namings.
- Metrics explanation improvement in the doc.
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* plugin flow: fix units; pass UptimeMetric and scale when needed
Aligns the units of the newly-introduced plugin metrics with the specification,
and passes our `UptimeMetric` through to the individual helper methods so that
they can scale appropriately for their context and our type-checker can ensure
we don't receive an incorrectly-scaled `Metric<Long>`.
Input `throughput`
------------------
all throughput metrics should be expressed in events-per-second; this
per-plugin scoped view of the pipeline's `input_throughput` flow should be
expressed in the same units.
Filters, Outputs `worker_utilization`
-------------------------------------
> a worker_utilization (duration / (uptime * worker count)) shows what percent
> of available resources an individual plugin instance is taking and can help
> identify where the blocker is.
To achieve this, we need to divide millis used by _millis_ available.
Filters, Outputs `worker_cost_per_event`
----------------------------------------
> we also provide a (to be named) cost-per-event metric (duration / event) to
> surface issues with a plugin that operates on a very small subset of events
> (via conditionals) but contributes disproportionately to the cost of getting
> its events through.
We start with a baseline of seconds-per-event, and acknowledge that this may
need to be scaled to a more understandable number before merging.
* plugin flow: express cost per event in millis per event
The "worker cost per event" metric when expressed as an inverse per-worker
throughput in seconds-per-event produces a range of values that are not
particularly easy to compare at-a-glance, with "nearly free" operations
being expressed in negative-exponent scientific notation and extremely
expensive operations being expressed with single-digits.
By scaling this metric up by a factor of 1000 to "millis per event" or its
eqivalent "seconds per thousand events", the resulting numbers in practice
are easier to make sense of:
+------------------------+--------------+---------------+------------+
| EXAMPLE / SCALE | s/event | ms/event | µs/event |
+------------------------+--------------+---------------+------------+
| no-op mutate @ 12k eps | 8.33e-05 | 0.0833 | 83.3 |
| stdout w/ dots codec | 0.000831 | 0.831 | 831 |
| ES out 1s RTT/125 | 0.008 | 8 | 8000 |
| ES out 30s retries/125 | 0.24 | 240 | 240000 |
| ES filter 1s/event | 1 | 1000 | 1000000 |
| grok 30s timeout | 30 | 30000 | 30000000 |
+------------------------+--------------+---------------+------------+
* plugin flow: reshape docs
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
* Collect growth events and bytes metrics if PQ is enabled: Java changes.
* Move queue flow under queue namespace.
* Pipeline level PQ flow metrics: add unit & integration tests.
* Include queue info in node stats sample.
* Apply suggestions from code review
Change uptime precision for PQ growth metrics to uptime seconds since PQ events are based on seconds.
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* Add safeguard when using lazy delegating gauge type.
* flow metrics: simplify generics of lazy implementation
Enables interface `FlowMetrics::create` to take suppliers that _implement_
a `Metric<? extends Number>` instead of requiring them to be pre-cast, and
avoid unnecessary exposure of the metrics value-type into our lazy init.
* flow metrics: use lazy init for PQ gauge-based metrics
* noop: use enum equality
Avoids routing two enum values through `MetricType#toString()`
and `String#equals()` when they can be compared directly.
* Apply suggestions from code review
Optional.ofNullable used for safe return. Doc includes real tested expected metric values.
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* flow metrics: make lazy-init wraper inherit from AbstractMetric
this allows the Jackson serialization annotations to work
* flow metrics: move pipeline queue-based flows into pipeline flow namespace
* Follow up for moving PQ growth metrics under pipeline.*.flow.
- Unit and integration tests are added or fixed.
- Documentation added along with sample response data
* flow: pipeline pq flow rates docs
* Do not expect flow in the queue section of API. Metrics moved to flow section.
Update logstash-core/spec/logstash/api/commands/stats_spec.rb
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* Integration test failure fix.
Mistake: `flow_status` should be `pipeline_flow_stats`
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* Integration test failures fix.
Number should be Numeric in the ruby specs.
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* Make CI happy.
* api specs: use PQ only where needed
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
- Adds a new method to the public API interface
- Pass the call through the JavaFilterDelegatorExt
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* flow metrics: extract to interface, sharable-comon base, and implementation
In preparation of landing an additional implementation of FlowMetric, we
shuffle the current parts net-unchanged to provide interfaces for `FlowMetric`
and `FlowCapture`, along with a sharable-common `BaseFlowMetric`, and move
our initial implementation to a new `SimpleFlowMetric`, accessible only
through a static factory method on our new `FlowMetric` interface.
* flow-rates: refactor LIFETIME up to sharable base
* util: add SetOnceReference
* flow metrics: tolerate unavailable captures
While the metrics we capture from in the initial release of FlowMetrics
are all backed by `Metric<T extends Number>` whose values are non-null,
we will need to capture from nullable `Gauge<Number>` in order to
support persistent queue size and capacity metrics. This refactor uses
the newly-introduced `SetOnceReference` to defer our baseline lifetime
capture until one is available, and ensures `BaseFlowMetric#doCapture`
creates a capture if-and-only-if non-null values are available from
the provided metrics.
* flow rates: limit precision for readability
* flow metrics: introduce policy-driven extended windows implementation
The new ExtendedFlowMetric is an alternate implementation of the FlowMetric
introduced in Logstash 8.5.0 that is capable of producing windoes for a set of
policies, which dictate the desired retention for the rate along with a
desired resolution.
- `current`: 10s retention, 1s resolution [*]
- `last_1_minute`: one minute retention, at 3s resolution [*]
- `last_5_minutes`: five minutes retention, at 15s resolution
- `last_15_minutes`: fifteen minutes retention, at 30s resolution
- `last_1_hour`: one hour retention, at 60s resolution
- `last_24_hours`: one day retention at 15 minute resolution
A given series may report a range for slightly longer than its configured
retention period, up to the either the series' configured resolution or
our capture rate (currently ~5s), whichever is greater. This approach
allows us to retain sufficient data-points to present meaningful rolling
averages while ensuring that our memory footprint is bounded.
When recording these captures, we first stage the newest capture, and then
promote the previously-staged caputure to the tail of a linked list IFF
the gap between our new capture and the newest promoted capture is larger
than our desired resolution.
When _reading_ these rates, we compact the head of that linked list forward
in time as far as possible without crossing the desired retention barrier,
at which point the head points to the youngest record that is old enough
to satisfy the period for the series.
We also occesionally compact the head during writes, but only if the head
is significantly out-of-date relative to the allowed retention.
As implemented here, this extended flow rates are on by default, but can be
disabled by setting the JVM system property `-Dlogstash.flowMetric=simple`
* flow metrics: provide lazy-initiazed implementation
* flow metrics: append lifetime baseline if available during init
* flow metric tests: continuously monitor combined capture count
* collection of unrelated minor code-review fixes
* collection of even more unrelated minor code-review fixes
* Flow metrics: initial implementation (#14509)
* metrics: eliminate race condition when registering metrics
Ensure our fast-lookup and store tables cannot diverge in a race condition
by wrapping mutation of both in a single mutex and appropriately handle
another thread winning the race to the lock by using the value that it
persisted instead of writing our own.
* metrics: guard against intermediate namespace conflicts
- ensures our safeguard that prevents using an existing metric as a namespace
is applied to _intermediate_ nodes, not just the tail-node, eliminating a
potential crash when sending `fetch_or_store` to a metric object that is not
expected to respond to `fetch_or_store`.
- uses the atomic `Concurrent::Map#compute_if_absent` instead of the
non-atomic `Concurrent::Map#fetch_or_store`, which is prone to
last-write-wins during contention (as-written, this method is only
executed under lock and not subject to contention)
- uses `Enumerable#reduce` to eliminate the need for recursion
* flow: introduce auto-advancing UptimeMetric
* flow: introduce FlowMetric with minimal current/lifetime rates
* flow: initialize pipeline metrics at pipeline start
* Controller and service layer implementation for flow metrics. (#14514)
* Controller and service layer implementation for flow metrics.
* Add flow metrics to unit test and benchmark cli definitions.
* flow: fix tests for metric types to accomodate new one
* Renaming concurrency and backpressure metrics.
Rename `concurrency` to `worker_concurrency ` and `backpressure` to `queue_backpressure` to provide proper scope naming.
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* metric: register flow metrics only when we have a collector (#14529)
the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.
* Unit tests and integration tests added for flow metrics. (#14527)
* Unit tests and integration tests added for flow metrics.
* Node stat spec and pipeline spec metric updates.
* Metric keys statically imported, implicit error expectation added in metric spec.
* Fix node status API spec after renaming flow metrics.
* Removing flow metric from PipelinesInfo DS (used in peridoci metric snapshot), integration QA updates.
* metric: register flow metrics only when we have a collector (#14529)
the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.
* Unit tests and integration tests added for flow metrics.
* Node stat spec and pipeline spec metric updates.
* Metric keys statically imported, implicit error expectation added in metric spec.
* Fix node status API spec after renaming flow metrics.
* Removing flow metric from PipelinesInfo DS (used in peridoci metric snapshot), integration QA updates.
* Rebasing with feature branch.
* metric: register flow metrics only when we have a collector
the collector is absent when the pipeline is run in test with a
NullMetricExt, or when the pipeline is explicitly configured to
not collect metrics using `metric.collect: false`.
* Apply suggestions from code review
Integration tests updated to test capturing the flow metrics.
* Flow metrics expectation updated in tegration tests.
* flow: refine integration expectations for reloads/monitoring
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
Co-authored-by: Mashhur <mashhur.sattorov@gmail.com>
* metric: add ScaledView with sub-unit precision to UptimeMetric (#14525)
* metric: add ScaledView with sub-unit precision to UptimeMetric
By presenting a _view_ of our metric that maintains sub-unit precision,
we prevent jitter that can be caused by our periodic poller not running at
exactly our configured cadence.
This is especially important as the UptimeMetric is used as the _denominator_ of
several flow metrics, and a capture at 4.999s that truncates to 4s, causes the
rate to be over-reported by ~25%.
The `UptimeMetric.ScaledView` implements `Metric<Number>`, so its full
lossless `BigDecimal` value is accessible to our `FlowMetric` at query time.
* metrics: reduce window for too-frequent-captures bug and document it
* fixup: provide mocked clock to flow metric
* Flow metrics cleanup (#14535)
* flow metrics: code-style and readability pass
* remove unused imports
* cleanup: simplify usage of internal helpers
* flow: migrate internals to use OptionalDouble
* Flow metrics global (#14539)
* flow: add global top-level flows
* docs: add flow metrics
* Top level flow metrics unit tests added. (#14540)
* Top level flow metrics unit tests added.
* Add unit tests when config reloads, make sure top-level flow metrics didn't get reset.
* Apply suggestions from code review
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* Validating against Hash test cases updated.
* For the safety check against exact type in unit tests.
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
* docs: section links and clarity in node stats API flow metrics
Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
Co-authored-by: Mashhur <mashhur.sattorov@gmail.com>
These are taken from the filebeat-modules doc with minor changes to
localise to winlogbeat and to remove the fully worked example.
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>