Commit graph

1649 commits

Author SHA1 Message Date
github-actions[bot]
e854ac7bf5
Backport PR #16423 to 8.x: DLQ-ing events that trigger an conditional evaluation error. (#16493)
* DLQ-ing events that trigger an conditional evaluation error. (#16423)

When a conditional evaluation encounter an error in the expression the event that triggered the issue is sent to pipeline's DLQ, if enabled for the executing pipeline.

This PR engage with the work done in #16322, the `ConditionalEvaluationListener` that is receives notifications about if-statements evaluation failure, is improved to also send the event to DLQ (if enabled in the pipeline) and not just logging it.

(cherry picked from commit b69d993d71)

* Fixed warning about non serializable field DeadLetterQueueWriter in serializable AbstractPipelineExt

---------

Co-authored-by: Andrea Selva <selva.andre@gmail.com>
2024-10-08 13:45:15 +01:00
github-actions[bot]
d1155988c1
Improve pipeline bootstrap error logs (#16495) (#16504)
This PR adds the cause errors details on the pipeline converge state error logs

(cherry picked from commit e84fb458ce)

Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com>
2024-10-03 13:31:22 +02:00
github-actions[bot]
eafcf577dd
Change LogStash::Util::SubstitutionVariables#replace_placeholders refine argument to optional (#16485) (#16488)
(cherry picked from commit 8368c00367)

Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com>
2024-10-01 12:16:03 -07:00
github-actions[bot]
14f52c0472
Fixes the issue where LS wipes out all quotes from docker env variables. (#16456) (#16459)
* Fixes the issue where LS wipes out all quotes from docker env variables. This is an issue when running LS on docker with CONFIG_STRING, needs to keep quotes with env variable.

* Add a docker acceptance integration test.

(cherry picked from commit 7c64c7394b)

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
2024-09-17 07:30:45 -07:00
github-actions[bot]
5ef86a8aa1
Fix ConditionalEvaluationError to do not include the event that errored in its serialiaxed form, because it's not expected that this class is ever serialized. (#16429) (#16430)
Make inner field of ConditionalEvaluationError transient to be avoided during serialization.

(cherry picked from commit bb7ecc203f)

Co-authored-by: Andrea Selva <selva.andre@gmail.com>
2024-09-06 11:40:42 +01:00
Andrea Selva
b88e23702c
Implements safe evaluation of conditional expressions, logging the error without killing the pipeline (#16322)
This PR protects the if statements against expression evaluation errors, cancel the event under processing and log it.
This avoids to crash the pipeline which encounter a runtime error during event condition evaluation, permitting to debug the root cause reporting the offending event and removing from the current processing batch.

Translates the `org.jruby.exceptions.TypeError`, `IllegalArgumentException`, `org.jruby.exceptions.ArgumentError` that could happen during `EventCodition` evaluation into a custom `ConditionalEvaluationError` which bubbles up on AST tree nodes. It's catched in the `SplitDataset` node.
Updates the generation of the `SplitDataset `so that the execution of `filterEvents` method inside the compute body is try-catch guarded and defer the execution to an instance of `AbstractPipelineExt.ConditionalEvaluationListener` to handle such error. In this particular case the error management consist in just logging the offending Event.


---------

Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
2024-09-05 10:57:10 +02:00
Andrea Selva
ac034a14ee
Generate Dataset code with meaningful fields names (#16386)
This PR is intended to help Logstash developers or users that want to better understand the code that's autogenerated to model a pipeline, assigning more meaningful names to the Datasets subclasses' fields.

Updates `FieldDefinition` to receive the name of the field from construction methods, so that it can be used during the code generation phase, instead of the existing incremental `field%n`.
Updates `ClassFields` to propagate the explicit field name down to the `FieldDefinitions`.
Update the `DatasetCompiler` that add fields to `ClassFields` to assign a proper name to generated Dataset's fields.
2024-09-04 11:10:29 +02:00
Mashhur
e104704830
Exclude substitution refinement on pipelines.yml (#16375)
* Exclude substitution refinement on pipelines.yml (applies on ENV vars and logstash.yml where env2yaml saves vars)

* Safety integration test for pipeline config.string contains ENV .
2024-08-09 09:33:01 -07:00
Ry Biesemeyer
3d13ebe33e
deprecate java less-than 17 (#16370) 2024-08-09 08:58:11 +01:00
Mashhur
62ef8a0847
[Bugfix] Resolve the array and char (single | double quote) escaped values of ${ENV} (#16365)
* Properly resolve the values from ENV vars if literal array string provided with ENV var.

* Docker acceptance test for persisting  keys and use actual values in docker container.

* Review suggestion.

Simplify the code by stripping whitespace before `gsub`, no need to check comma and split.

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

---------

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
2024-08-06 11:09:26 -07:00
Ry Biesemeyer
c633ad2568
settings: add support for observing settings after post-process hooks (#16339)
Because logging configuration occurs after loading the `logstash.yml`
settings, deprecation logs from `LogStash::Settings::DeprecatedAlias#set` are
effectively emitted to a null logger and lost.

By re-emitting after the post-process hooks, we can ensure that they make
their way to the deprecation log. This change adds support for any setting
that responds to `Object#observe_post_process` to receive it after all
post-processing hooks have been executed.

Resolves: elastic/logstash#16332
2024-07-24 10:22:34 +01:00
João Duarte
8f2dae618c
correctly handle stack overflow errors during pipeline compilation (#16323)
This commit improves error handling when pipelines that are too big hit the Xss limit and throw a StackOverflowError. Currently the exception is printed outside of the logger, and doesn’t even show if log.format is json, leaving the user to wonder what happened.

A couple of thoughts on the way this is implemented:

* There should be a first barrier to handle pipelines that are too large based on the PipelineIR compilation. The barrier would use the detection of Xss to determine how big a pipeline could be. This however doesn't reduce the need to still handle a StackOverflow if it happens.
* The catching of StackOverflowError could also be done on the WorkerLoop. However I'd suggest that this is unrelated to the Worker initialization itself, it just so happens that compiledPipeline.buildExecution is computed inside the WorkerLoop class for performance reasons. So I'd prefer logging to not come from the existing catch, but from a dedicated catch clause.

Solves #16320
2024-07-18 10:08:38 +01:00
Ry Biesemeyer
66aeeeef83
Json normalization performance (#16313)
* licenses: allow elv2, standard abbreviation for Elastic License version 2

* json-dump: reduce unicode normalization cost

Since the underlying JrJackson now properly (and efficiently) encodes the
UTF-8 transcode of whichever strings it is given, we no longer need to
pre-normalize to UTF-8 in ruby _except_ when the string is flagged as BINARY
because we have alternate behaviour to preserve valid UTF-8 sequences.

By emitting a _copy_ of binary-flagged strings that have been re-flagged as
UTF-8, we allow the downstream (efficient) encoding operation in jrjackson
to produce equivalent behaviour at much lower cost.

* cleanup: remove orphan unicode normalizer
2024-07-09 14:12:21 -07:00
João Duarte
121b1c9632
update jruby to 9.4.8.0 (#16278)
https://www.jruby.org/2024/07/02/jruby-9-4-8-0.html

> Fixed a bug in the bytecode JIT causing patterns to execute incorrect branches. #8283, #8284
> jruby-openssl is updated to 0.15.0, with updated Bouncy Castle libraries to avoid CVEs in older versions.
> uri is updated to 0.12.2, mitigating CVE
> net-ftp is updated to 0.3.7 with restored functionality on JRuby.

Exhaustive test suite: https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/580
2024-07-02 19:57:55 +01:00
Edmo Vamerlatti Costa
784fa186c8
Ensure pipeline metrics are cleared on the pipeline shutdown (#16264)
This commit fixed the configuration reload process to clean up the pipeline's metric store, so it does not retain references to failed pipelines components.
2024-06-28 13:13:39 +02:00
Mashhur
0cfe6b0801
Add RubyEvent#dup support and unit test case to keep Json#dump(Event) safe. (#16255)
* Add RubyEvent#dup support and unit test case to keep Json#dump(Event) safe.


Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>

---------

Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
2024-06-27 13:08:56 -07:00
Ry Biesemeyer
0ec16ca398
Unicode pipeline and plugin ids (#15971)
* fix: restore support for unicode pipeline- and plugin-id's

JRuby's `Ruby#newSymbol(String)` throws an exception when provided a `String`
that contains characters outside of lower-ASCII because JRuby internals expect
"the incoming String to be one of our mangled ISO-8859-1 strings" as noted in
a comment on jruby/jruby#6217.

Instead, we use `Ruby#newString(String)` to create a new `RubyString` (which
works properly), and then rely on `RubyString#intern` to get our `RubySymbol`.

This fixes a regression introduced in the 8.7 series in which pipeline id's
are consistently represented as ruby symbols in the metrics store, and ensures
similar issue does not exist when specifying a plugin id that contains
characters above the lower-ASCII plane.

* fix: use properly-encoded RubySymbol in PipelineConfig

We cannot rely on `RubySymbol#toString` to produce a properly-encoded `String`
whe the string contains characters above the lower-ASCII plane because the
result is effectively a binary ruby-internal marshal of the bytes that only
holds when the symbol contains lower-ASCII.

Instead, we can use the internally-memoizing `RubySymbol#name` to get a
properly-encoded `RubyString`, and `RubyString#asJavaString()` to get a
properly-encoded java-`String`.

* fix: properly serialize unicode pipeline names in API output

Jackson's JSON serializer leaks the JRuby-internal byte structure of Symbols,
which only aligns with the byte-structure of the symbol's actual string when
that string is wholly-comprised of lower-ASCII characters.

By pre-converting Symbols to Strings, we ensure that the result is readable
and useful.

* spec: bypass monitoring specs for unicode pipeline ids when PQ enabled
2024-06-25 08:35:28 -07:00
Ry Biesemeyer
92909cb1c4
json: remove unnecessary dup/freeze in serialization (#16213) 2024-06-20 09:15:49 -07:00
Andrea Selva
321e407e53
Avoid to log file not found errors when DLQ segments are removed concurrently between writer and reader. (#16204)
* Rework the logic to delete DLQ eldest segments to be more resilient on file not found errors and avoid to log warn messages that there isn't any action the user can do to solve.

* Fixed test case, when path point to a file that doesn't exist, rely always on path name comparator. Reworked the code to simplify, not needing anymore the tri-state variable
2024-06-20 08:52:19 -07:00
Andrea Selva
ed930f820d
Avoid mocking the value returned in global SETTINGS constant. (#16245)
This a refactoring of test fixture.
Avoid mocking the value returned in global SETTINGS constant. Use instead the local setting map instance used in subject creation.
2024-06-20 14:25:53 +02:00
Ry Biesemeyer
0f6fa5c8fb
p2p: adds opt-in pipeline bus with less synchronization (#16194)
* p2p: extract interface from v1 pipeline bus

* p2p: extract pipeline push to abstract

* p2p: add opt-in unblocked "v2" implementation

Adds a v2 implementation that does not synchronize on the sender so that
multiple workers can send events through a common `pipeline` output instance
simultaneously.

In this implementation, an `AddressStateMapping` provides synchronized
mutation and cleanup of the underlying `AddressState`, and allows only
queryable mutable views (`AddressState.ReadOnly`) to escape encapsulation.

The implementation also holds indentity-keyed mapping from `PipelineOutput`s
to the set of `AddressState.ReadOnly`s it is regested as a sender for so
that they can be quickly resolved at runtime.

* p2p: more tests for pipeline restart behaviour

* p2p: make v2 pipeline bus the default
2024-06-17 07:35:54 -07:00
Andrea Selva
fab345881a
Introduce filesystem signalling from DLQ read to writer to update byte size metric accordingly when the reader uses clean_consumed (#16195)
Updates the DLQ reader to create a notification file (`.deleted_segment`) which signal when a segment is deleted in consequence of `clean_consumed` set. Updates the DLQ writer to have a filesystem watch so that can receive the reader's signal and update the exposed metric,  loading the size by listing FS segments occupation.
2024-06-17 14:27:39 +02:00
Andrea Selva
efa83787a5
Revert PR #16050
The PR was created to skip resolving environment variable references in comments present in the “config.string” pipelines defined in the pipelines.yml file.
However it introduced a bug that no longer resolves env var references in values of settings like pipeline.batch.size or queue.max_bytes.
For now we’ll revert this PR and create a fix that handles both problems.
2024-06-06 20:24:45 +01:00
Ry Biesemeyer
ea930861ef
PQ: avoid blocking writer when precisely full (#16176)
* pq: avoid blocking writer when queue is precisely full

A PQ is considered full (and therefore needs to block before releasing the
writer) when its persisted size on disk _exceeds_ its `queue.max_bytes`
capacity.

This removes an edge-case preemptive block when the persisted size after
writing an event _meets_ its `queue.max_bytes` precisely AND its current
head page has insufficient room to also accept a hypothetical future event.

Fixes: elastic/logstash#16172

* docs: PQ `queue.max_bytes` cannot be less than `queue.page_capacity`
2024-05-22 08:23:18 -07:00
Mashhur
979d30d701
Handle non-unicode payload in Logstash. (#16072)
* A logic to handle non-unicode payload in Logstash.

* Well tested and code organized version of the logic.

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>

* Upgrade jrjackson to 0.4.20

* Code review: simplify the logic with a standard String#encode interface with replace option.

Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>

---------

Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <ry.biesemeyer@elastic.co>
2024-05-16 10:42:06 -07:00
Mashhur
734405dcbe
Replace stack traces with logger in DSL. (#16159) 2024-05-13 13:45:56 -07:00
Jonas L. B
0d6ba8d1bd
Allow comments in hashes and before EOF (#16058)
In the grammar definitions for hashes, `whitespace` was replaced with `cs` to allow either whitespace _or_ comments. 
Additionally, the grammar definition for comments was previously required to end with a newline, now it can end with a newline _or_ EOF, using the "not anything" treetop rule `!.`.

Co-authored-by: Jonas Lundholm Bertelsen <jonas.lundholm.bertelsen@beumer.com>
2024-05-08 14:07:26 +02:00
João Duarte
3068934c6f
upgrade java_input_example to 1.0.3 (#16152)
follow up to logstash-plugins/logstash-input-java_input_example@090142d
2024-05-08 13:05:42 +01:00
João Duarte
0d6117173f
update multiple dependencies (#16136)
This upgrades multiple java libraries:

* snakeyaml
* shadow
* gradle
* guava
* commons-io
* commons-logging
* commons-codec
* commons-compress
* commons-lang3
* commons-csv
* log4j
* google-java-format
* httpclient
* httpcore
* javassist
* jackson
* jackson-databind
* wiremock-standalone

Gems:

* rack
*sinatra
*octokit
* gems
* rake
* webmock

Also upgrades Java to 17.0.11+9.

Leftover upgrades:

* commons-csv 1.8 breaks license checker
* janino 3.1.12 breaks java tests
* log4j 2.21.0 breaks java compilation
2024-05-08 09:13:41 +01:00
Ry Biesemeyer
9e452d2e54
Update junit 4 13 (#16138)
* test-deps: update junit to latest 4.13

* test-deps: address deprecation of ExpectedException

* test-deps: use org.junit.Assert.assertThrows
2024-05-03 13:49:16 -07:00
Andrea Selva
830733d758
Provide opt-in flag to avoid fields name clash when log format is json (#15969)
Adds log.format.json.fix_duplicate_message_fields feature flag to rename the clashing fields when json logging format (log.format) is selected.
In case two message fields clashes on structured log message, then the second is renamed attaching _1 suffix to the field name.
By default the feature is disabled and requires user to explicitly enable the behaviour.

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
2024-04-17 16:37:05 +02:00
Mashhur
9483ee04c6
Fix the exception behavior when config.string contains ${VAR} in the comments. (#16050)
* Wipe out comment lines if config comment contains.

* Remove substitution var process when loading the YAML, instead align on the generic approach which LSCL happens during the pipeline compile.

* Update logstash-core/src/main/java/org/logstash/config/ir/PipelineConfig.java

Put the logging config back as it is being used with composed configs.
2024-04-11 07:32:28 -07:00
Andrea Selva
afa646fbcb
Introduce a new setting to give preference to Java heap or direct space buffer allocation type (#16054)
Introduce a new setting named `pipeline.buffer.type` which could be valued direct or heap to enable the allocation on Java heap.
The processing of the setting is done in `LogStash::Runner#execute` and sets the Java properties considered by Netty to disable the direct allocation: `io.netty.noPreferDirect`.
However, if that system property is already configured explicitly by the user (because set in `jvm.options`or `LS_JAVA_OPTS`) the setting doesn't take place and warning log is reported, respecting the user's will.

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
2024-04-10 15:23:47 +02:00
Andrea Selva
6a04854e4c
JDK 21 move (#15719)
Adaptations to run Logstash on JDK 21:

- Java 8 support is obsolete and will be removed.
- Thread's `getId` (not final) replaced by final `threadId` https://bugs.openjdk.org/browse/JDK-8017617
- Verify the warnings "this-escape" when a constructor use other method or pass around `this` reference to other methods https://bugs.openjdk.org/browse/JDK-8015831
- URL constructor is deprecated, use `<uri_instance>.toURL()` (since JDK 20)
-  Manages new (since JDK 20) `G1 Concurrent GC` MX Bean, [ref](https://github.com/elastic/logstash/pull/15719#issuecomment-1946367785)
2024-04-03 17:08:12 +02:00
João Duarte
59bd376360
upgrade ruby-maven-libs to 3.8.9 (#15894)
Given that JRuby comes with ruby-maven-libs 3.3.9 this commit upgrades the gem to 3.8.9 and ensures files from 3.3.9 are not included in the distribution.
2024-03-18 14:30:13 +01:00
carrychair
d1e624b81c
remove repetitions of "the" word (#15987)
Signed-off-by: carrychair <linghuchong404@gmail.com>
2024-03-17 10:53:59 +00:00
Andrea Selva
834c779c5a
Remove dlq writer dead code add some logs (#15965)
Remove unused method createFlushScheduler and add some debug logs to the finalizeSegment method to better follow what happens during its execution in case of problems.
2024-02-22 14:06:53 +01:00
Andrea Selva
18bbb3156c
Add shutdown step of DLQ flusher scheduled service (#15964)
This PR adds a shutdown method to the SchedulerService class used to handle actions to be executed on a certain cadence. In particular is used to execute scheduled finalization of DLQ head segment.
Updates the close method of the DLQ writer to invoke this additional shutdown on the service instance.
2024-02-22 12:33:23 +01:00
Andrea Selva
ff37e1e0d3
Fix failing DLQ test due to time scheduling (#15960)
Adds a burning of time condition to avoid a collision of time which, under certain circumstances, would fail the test.
The sealing of a segment happens if the segment is considered as stale, which requires 2 conditions:

- the segment must have received a write.
- the time of the last write must exceed the flush interval.

In this failing test, the flush interval is set to ZERO because of the synchronicity of the test, to avoid time dependency. However, with coarse grain timer resolution, could happen that the last write coincide with the time of the stale check, so fail the seal condition.
2024-02-22 11:28:31 +01:00
Ry Biesemeyer
38e8c5d3f9
flow_metrics: pull worker_utilization up to pipeline-level (#15912) 2024-02-06 11:50:34 -08:00
Dimitrios Liappis
c33afd4cd0
Mute DLQ test on Windows (#15843)
This commit mutes the DLQ test:
`testDLQWriterFlusherRemovesExpiredSegmentWhenCurrentHeadSegmentIsEmpty`
when running on Windows.

Closes https://github.com/elastic/logstash/issues/15768
2024-01-24 09:53:56 +02:00
Pavel Zorin
2c83a52380
[CI] Send Java and ruby tests to sonarqube simultaneously (#15810)
* Ruby code coverage with SimpleCov json formatter

* [CI] Send Java and ruby tests to sonarqube simultaneously

* Enabled COVERAGE for ruby tests

* Enabled COVERAGE for ruby tests

* Enabled COVERAGE for ruby tests

* Enabled COVERAGE for ruby tests

* Enabled COVERAGE for ruby tests

* Added compiled classes to artifacts

* Test change

* Removed test changes

* Returned back ENABLE_SONARQUBE condition

* Removed debug line

* Diable Ruby coverage if ENABLE_SONARQUBE is not true

* Run sonar scan on pull requests and onn push to main

* Run sonar can on release branches
2024-01-17 19:04:37 +00:00
Andrea Selva
50a589493c
Bump Puma lower version constraint to >= 6.4.2 (#15773) 2024-01-10 09:52:56 +00:00
Edmo Vamerlatti Costa
a21ced0946
Add system properties to configure Jackson's stream read constraints (#15720)
This commit added a few jvm.options properties to configure the Jackson read constraints defaults (Maximum Number value length, Maximum String value length, and Maximum Nesting depth).
2024-01-08 17:48:11 +01:00
Edmo Vamerlatti Costa
41ec183f09
Fix logstash-keystore multiple keys operations with command flags (#15737)
This commit fixes how the keystore tool handle the command's options, including validation for unknown options, and adding the --stdin flag to the add command.
2024-01-03 18:57:57 +01:00
Andrea Selva
48b0af1206
Replace Gradle's report.enabled setting to report's required property (#15706) 2023-12-20 12:40:10 +01:00
Andrea Selva
6b22cc8dd1
Separate scheduling of segments flushes from time (#15680)
Introduces a new interface named SchedulerService to abstract from the ScheduledExecutorService to execute the DLQ flushes of segments. Abstracting from time provides a benefit in testing, where the test doesn't have to wait for things to happen, but those things could happen synchronously.
2023-12-18 11:07:02 +01:00
Andrea Selva
eddd91454f
Shutdown DLQ segments flusher only if it has been started (#15649)
In DLQ unit testing sometime the DLQ writer is started explicitly without starting the segments flushers. In such cases the test 's logs contains exceptions which could lead to think that the test fails silently.

Avoid to invoke scheduledFlusher's shutdown when it's not started (such behaviour is present only in tests).
2023-12-05 09:06:24 +01:00
kaisecheng
05392ad16e
Added missing method of logger wrapper for puma (#15640)
This commit fixes no method error when node stats API got
invalid API path, which triggers puma to print error using stderr

Fix: #15639
2023-11-30 13:53:18 +00:00
Edmo Vamerlatti Costa
5543e3c3b2
Add support to add and remove multiple keystore keys in a single operation (#15612)
This commit added support to add and remove multiple keystore keys in a single operation. It also fixed the empty value validation for editing existing key values and added ASCII validation for values.
2023-11-30 10:21:51 +01:00