Because logging configuration occurs after loading the `logstash.yml`
settings, deprecation logs from `LogStash::Settings::DeprecatedAlias#set` are
effectively emitted to a null logger and lost.
By re-emitting after the post-process hooks, we can ensure that they make
their way to the deprecation log. This change adds support for any setting
that responds to `Object#observe_post_process` to receive it after all
post-processing hooks have been executed.
Resolves: elastic/logstash#16332
(cherry picked from commit c633ad2568)
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
This commit improves error handling when pipelines that are too big hit the Xss limit and throw a StackOverflowError. Currently the exception is printed outside of the logger, and doesn’t even show if log.format is json, leaving the user to wonder what happened.
A couple of thoughts on the way this is implemented:
* There should be a first barrier to handle pipelines that are too large based on the PipelineIR compilation. The barrier would use the detection of Xss to determine how big a pipeline could be. This however doesn't reduce the need to still handle a StackOverflow if it happens.
* The catching of StackOverflowError could also be done on the WorkerLoop. However I'd suggest that this is unrelated to the Worker initialization itself, it just so happens that compiledPipeline.buildExecution is computed inside the WorkerLoop class for performance reasons. So I'd prefer logging to not come from the existing catch, but from a dedicated catch clause.
Solves #16320
(cherry picked from commit 8f2dae618c)
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
This commit removes Debian 10 (Buster) which is EOL
since July 1 2024[^1] from CI.
Relates https://github.com/elastic/ingest-dev/issues/2872
(cherry picked from commit f728c44a0a)
Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com>
Now that we have custom VM images for Ubuntu 24.04, this commit adds
CI for Ubuntu 24.04.
This is a revert of #16279
(cherry picked from commit ea0c16870f)
Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com>
This commit fixed the configuration reload process to clean up the pipeline's metric store, so it does not retain references to failed pipelines components.
* Add RubyEvent#dup support and unit test case to keep Json#dump(Event) safe.
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
---------
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
* fix: restore support for unicode pipeline- and plugin-id's
JRuby's `Ruby#newSymbol(String)` throws an exception when provided a `String`
that contains characters outside of lower-ASCII because JRuby internals expect
"the incoming String to be one of our mangled ISO-8859-1 strings" as noted in
a comment on jruby/jruby#6217.
Instead, we use `Ruby#newString(String)` to create a new `RubyString` (which
works properly), and then rely on `RubyString#intern` to get our `RubySymbol`.
This fixes a regression introduced in the 8.7 series in which pipeline id's
are consistently represented as ruby symbols in the metrics store, and ensures
similar issue does not exist when specifying a plugin id that contains
characters above the lower-ASCII plane.
* fix: use properly-encoded RubySymbol in PipelineConfig
We cannot rely on `RubySymbol#toString` to produce a properly-encoded `String`
whe the string contains characters above the lower-ASCII plane because the
result is effectively a binary ruby-internal marshal of the bytes that only
holds when the symbol contains lower-ASCII.
Instead, we can use the internally-memoizing `RubySymbol#name` to get a
properly-encoded `RubyString`, and `RubyString#asJavaString()` to get a
properly-encoded java-`String`.
* fix: properly serialize unicode pipeline names in API output
Jackson's JSON serializer leaks the JRuby-internal byte structure of Symbols,
which only aligns with the byte-structure of the symbol's actual string when
that string is wholly-comprised of lower-ASCII characters.
By pre-converting Symbols to Strings, we ensure that the result is readable
and useful.
* spec: bypass monitoring specs for unicode pipeline ids when PQ enabled
Add a buildkite pipeline to do benchmark.
The script does benchmark by running Filebeats (docker) -> Logstash (docker) -> ES Cloud.
Logstash metrics and benchmark results are sent to the same ES Cloud.
- Secrets store in vault `secret/ci/elastic-logstash/benchmark`
- Use flog (docker) to generate ~2GB logs
- Pull the snapshot docker image of the main branch every day
- Logstash runs two pipelines, main and node_stats
- The main pipeline handles beats ingestion, sending data to the data stream `logs-generic-default`
- It runs for all combinations. (pq + mq) x worker x batch size
- Each test runs for ~7 minutes
- The node_stats pipeline retrieves /_node/stats API every 30s and sends it to the data stream `metrics-nodestats-logstash`
- The script sends a summary of EPS and resource usage to index `benchmark_summary`
The buildkite pipeline accepts ENV variables to customize the test
| Variable Name | Default Value | Comment |
|-----------------|---------------------|----------------------------------------------------|
| FB_VERSION | 8.13.4 | docker tag |
| LS_VERSION | | docker tag |
| LS_JAVA_OPTS | -Xmx2g | by default, Xmx is set to half of memory |
| MULTIPLIERS | 2,4,6 | determine the number of workers (cpu * multiplier) |
| BATCH_SIZES | 125,1000 | |
| CPU | 4 | number of cpu for Logstash container |
| MEM | 4 | number of GB for Logstash container |
| QTYPE | memory | queue type to test -- persisted; memory; all |
| FB_CNT | 4 | number of filebeats to use in benchmark |
To check the result
- `vault read secret/ci/elastic-logstash/benchmark` to get the host and credentials
- `curl -u "$ES_USER:$ES_PW" "$ES_HOST/benchmark_summary/_search"`
Fixes: https://github.com/elastic/ingest-dev/issues/3377
* Rework the logic to delete DLQ eldest segments to be more resilient on file not found errors and avoid to log warn messages that there isn't any action the user can do to solve.
* Fixed test case, when path point to a file that doesn't exist, rely always on path name comparator. Reworked the code to simplify, not needing anymore the tri-state variable
This a refactoring of test fixture.
Avoid mocking the value returned in global SETTINGS constant. Use instead the local setting map instance used in subject creation.
* Add wolfi as an option to the build process
* Add docker acceptance tests for the wolfi image
* Change how tests are done on the java process, due to "ps -C" not being available on wolfi
replaces and closes https://github.com/elastic/logstash/pull/16116
Co-authored-by: Andres Rodriguez <andreserl@gmail.com>
* p2p: extract interface from v1 pipeline bus
* p2p: extract pipeline push to abstract
* p2p: add opt-in unblocked "v2" implementation
Adds a v2 implementation that does not synchronize on the sender so that
multiple workers can send events through a common `pipeline` output instance
simultaneously.
In this implementation, an `AddressStateMapping` provides synchronized
mutation and cleanup of the underlying `AddressState`, and allows only
queryable mutable views (`AddressState.ReadOnly`) to escape encapsulation.
The implementation also holds indentity-keyed mapping from `PipelineOutput`s
to the set of `AddressState.ReadOnly`s it is regested as a sender for so
that they can be quickly resolved at runtime.
* p2p: more tests for pipeline restart behaviour
* p2p: make v2 pipeline bus the default
Updates the DLQ reader to create a notification file (`.deleted_segment`) which signal when a segment is deleted in consequence of `clean_consumed` set. Updates the DLQ writer to have a filesystem watch so that can receive the reader's signal and update the exposed metric, loading the size by listing FS segments occupation.
The PR was created to skip resolving environment variable references in comments present in the “config.string” pipelines defined in the pipelines.yml file.
However it introduced a bug that no longer resolves env var references in values of settings like pipeline.batch.size or queue.max_bytes.
For now we’ll revert this PR and create a fix that handles both problems.
Updates the plain, json and pipeline appenders in default config/log4j2.properties to define a delete rule executed during the rollover strategy, which deletes compressed log archives older than 7 days.
Updates the documentation that describe the logging configuration to explain how the rollover file works, how to configure the strategy, in particular how to update to setup space limitation condition on the rollover.
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
* pq: avoid blocking writer when queue is precisely full
A PQ is considered full (and therefore needs to block before releasing the
writer) when its persisted size on disk _exceeds_ its `queue.max_bytes`
capacity.
This removes an edge-case preemptive block when the persisted size after
writing an event _meets_ its `queue.max_bytes` precisely AND its current
head page has insufficient room to also accept a hypothetical future event.
Fixes: elastic/logstash#16172
* docs: PQ `queue.max_bytes` cannot be less than `queue.page_capacity`